Papers
Topics
Authors
Recent
Search
2000 character limit reached

Retrosynthesis prediction enhanced by in-silico reaction data augmentation

Published 31 Jan 2024 in cs.LG and cs.AI | (2402.00086v1)

Abstract: Recent advances in ML have expedited retrosynthesis research by assisting chemists to design experiments more efficiently. However, all ML-based methods consume substantial amounts of paired training data (i.e., chemical reaction: product-reactant(s) pair), which is costly to obtain. Moreover, companies view reaction data as a valuable asset and restrict the accessibility to researchers. These issues prevent the creation of more powerful retrosynthesis models due to their data-driven nature. As a response, we exploit easy-to-access unpaired data (i.e., one component of product-reactant(s) pair) for generating in-silico paired data to facilitate model training. Specifically, we present RetroWISE, a self-boosting framework that employs a base model inferred from real paired data to perform in-silico reaction generation and augmentation using unpaired data, ultimately leading to a superior model. On three benchmark datasets, RetroWISE achieves the best overall performance against state-of-the-art models (e.g., +8.6% top-1 accuracy on the USPTO-50K test dataset). Moreover, it consistently improves the prediction accuracy of rare transformations. These results show that Retro- WISE overcomes the training bottleneck by in-silico reactions, thereby paving the way toward more effective ML-based retrosynthesis models.

Authors (4)
Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Baylon JL, Cilfone NA, Gulcher JR, et al (2019) Enhancing retrosynthetic reaction prediction with deep learning using multiscale reaction classification. Journal of chemical information and modeling 59(2):673–688 Blakemore et al (2018) Blakemore DC, Castro L, Churcher I, et al (2018) Organic synthesis provides opportunities to transform drug discovery. Nature chemistry 10(4):383–394 Born and Manica (2023) Born J, Manica M (2023) Regression transformer enables concurrent sequence regression and generation for molecular language modelling. Nature Machine Intelligence 5(4):432–444 Castro et al (2022) Castro E, Godavarthi A, Rubinfien J, et al (2022) Transformer-based protein generation with regularized latent space optimization. Nature Machine Intelligence 4(10):840–851 Cereto-Massagué et al (2015) Cereto-Massagué A, Ojeda MJ, Valls C, et al (2015) Molecular fingerprint similarity search in virtual screening. Methods 71:58–63 Chen and Jung (2021) Chen S, Jung Y (2021) Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au 1(10):1612–1620 Coley et al (2017) Coley CW, Rogers L, Green WH, et al (2017) Computer-assisted retrosynthesis based on molecular similarity. ACS central science 3(12):1237–1245 Coley et al (2019) Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Blakemore DC, Castro L, Churcher I, et al (2018) Organic synthesis provides opportunities to transform drug discovery. Nature chemistry 10(4):383–394 Born and Manica (2023) Born J, Manica M (2023) Regression transformer enables concurrent sequence regression and generation for molecular language modelling. Nature Machine Intelligence 5(4):432–444 Castro et al (2022) Castro E, Godavarthi A, Rubinfien J, et al (2022) Transformer-based protein generation with regularized latent space optimization. Nature Machine Intelligence 4(10):840–851 Cereto-Massagué et al (2015) Cereto-Massagué A, Ojeda MJ, Valls C, et al (2015) Molecular fingerprint similarity search in virtual screening. Methods 71:58–63 Chen and Jung (2021) Chen S, Jung Y (2021) Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au 1(10):1612–1620 Coley et al (2017) Coley CW, Rogers L, Green WH, et al (2017) Computer-assisted retrosynthesis based on molecular similarity. ACS central science 3(12):1237–1245 Coley et al (2019) Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Born J, Manica M (2023) Regression transformer enables concurrent sequence regression and generation for molecular language modelling. Nature Machine Intelligence 5(4):432–444 Castro et al (2022) Castro E, Godavarthi A, Rubinfien J, et al (2022) Transformer-based protein generation with regularized latent space optimization. Nature Machine Intelligence 4(10):840–851 Cereto-Massagué et al (2015) Cereto-Massagué A, Ojeda MJ, Valls C, et al (2015) Molecular fingerprint similarity search in virtual screening. Methods 71:58–63 Chen and Jung (2021) Chen S, Jung Y (2021) Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au 1(10):1612–1620 Coley et al (2017) Coley CW, Rogers L, Green WH, et al (2017) Computer-assisted retrosynthesis based on molecular similarity. ACS central science 3(12):1237–1245 Coley et al (2019) Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Castro E, Godavarthi A, Rubinfien J, et al (2022) Transformer-based protein generation with regularized latent space optimization. Nature Machine Intelligence 4(10):840–851 Cereto-Massagué et al (2015) Cereto-Massagué A, Ojeda MJ, Valls C, et al (2015) Molecular fingerprint similarity search in virtual screening. Methods 71:58–63 Chen and Jung (2021) Chen S, Jung Y (2021) Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au 1(10):1612–1620 Coley et al (2017) Coley CW, Rogers L, Green WH, et al (2017) Computer-assisted retrosynthesis based on molecular similarity. ACS central science 3(12):1237–1245 Coley et al (2019) Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Cereto-Massagué A, Ojeda MJ, Valls C, et al (2015) Molecular fingerprint similarity search in virtual screening. Methods 71:58–63 Chen and Jung (2021) Chen S, Jung Y (2021) Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au 1(10):1612–1620 Coley et al (2017) Coley CW, Rogers L, Green WH, et al (2017) Computer-assisted retrosynthesis based on molecular similarity. ACS central science 3(12):1237–1245 Coley et al (2019) Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Chen S, Jung Y (2021) Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au 1(10):1612–1620 Coley et al (2017) Coley CW, Rogers L, Green WH, et al (2017) Computer-assisted retrosynthesis based on molecular similarity. ACS central science 3(12):1237–1245 Coley et al (2019) Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Coley CW, Rogers L, Green WH, et al (2017) Computer-assisted retrosynthesis based on molecular similarity. ACS central science 3(12):1237–1245 Coley et al (2019) Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  2. Blakemore DC, Castro L, Churcher I, et al (2018) Organic synthesis provides opportunities to transform drug discovery. Nature chemistry 10(4):383–394 Born and Manica (2023) Born J, Manica M (2023) Regression transformer enables concurrent sequence regression and generation for molecular language modelling. Nature Machine Intelligence 5(4):432–444 Castro et al (2022) Castro E, Godavarthi A, Rubinfien J, et al (2022) Transformer-based protein generation with regularized latent space optimization. Nature Machine Intelligence 4(10):840–851 Cereto-Massagué et al (2015) Cereto-Massagué A, Ojeda MJ, Valls C, et al (2015) Molecular fingerprint similarity search in virtual screening. Methods 71:58–63 Chen and Jung (2021) Chen S, Jung Y (2021) Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au 1(10):1612–1620 Coley et al (2017) Coley CW, Rogers L, Green WH, et al (2017) Computer-assisted retrosynthesis based on molecular similarity. ACS central science 3(12):1237–1245 Coley et al (2019) Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Born J, Manica M (2023) Regression transformer enables concurrent sequence regression and generation for molecular language modelling. Nature Machine Intelligence 5(4):432–444 Castro et al (2022) Castro E, Godavarthi A, Rubinfien J, et al (2022) Transformer-based protein generation with regularized latent space optimization. Nature Machine Intelligence 4(10):840–851 Cereto-Massagué et al (2015) Cereto-Massagué A, Ojeda MJ, Valls C, et al (2015) Molecular fingerprint similarity search in virtual screening. Methods 71:58–63 Chen and Jung (2021) Chen S, Jung Y (2021) Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au 1(10):1612–1620 Coley et al (2017) Coley CW, Rogers L, Green WH, et al (2017) Computer-assisted retrosynthesis based on molecular similarity. ACS central science 3(12):1237–1245 Coley et al (2019) Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Castro E, Godavarthi A, Rubinfien J, et al (2022) Transformer-based protein generation with regularized latent space optimization. Nature Machine Intelligence 4(10):840–851 Cereto-Massagué et al (2015) Cereto-Massagué A, Ojeda MJ, Valls C, et al (2015) Molecular fingerprint similarity search in virtual screening. Methods 71:58–63 Chen and Jung (2021) Chen S, Jung Y (2021) Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au 1(10):1612–1620 Coley et al (2017) Coley CW, Rogers L, Green WH, et al (2017) Computer-assisted retrosynthesis based on molecular similarity. ACS central science 3(12):1237–1245 Coley et al (2019) Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Cereto-Massagué A, Ojeda MJ, Valls C, et al (2015) Molecular fingerprint similarity search in virtual screening. Methods 71:58–63 Chen and Jung (2021) Chen S, Jung Y (2021) Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au 1(10):1612–1620 Coley et al (2017) Coley CW, Rogers L, Green WH, et al (2017) Computer-assisted retrosynthesis based on molecular similarity. ACS central science 3(12):1237–1245 Coley et al (2019) Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Chen S, Jung Y (2021) Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au 1(10):1612–1620 Coley et al (2017) Coley CW, Rogers L, Green WH, et al (2017) Computer-assisted retrosynthesis based on molecular similarity. ACS central science 3(12):1237–1245 Coley et al (2019) Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Coley CW, Rogers L, Green WH, et al (2017) Computer-assisted retrosynthesis based on molecular similarity. ACS central science 3(12):1237–1245 Coley et al (2019) Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  3. Born J, Manica M (2023) Regression transformer enables concurrent sequence regression and generation for molecular language modelling. Nature Machine Intelligence 5(4):432–444 Castro et al (2022) Castro E, Godavarthi A, Rubinfien J, et al (2022) Transformer-based protein generation with regularized latent space optimization. Nature Machine Intelligence 4(10):840–851 Cereto-Massagué et al (2015) Cereto-Massagué A, Ojeda MJ, Valls C, et al (2015) Molecular fingerprint similarity search in virtual screening. Methods 71:58–63 Chen and Jung (2021) Chen S, Jung Y (2021) Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au 1(10):1612–1620 Coley et al (2017) Coley CW, Rogers L, Green WH, et al (2017) Computer-assisted retrosynthesis based on molecular similarity. ACS central science 3(12):1237–1245 Coley et al (2019) Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Castro E, Godavarthi A, Rubinfien J, et al (2022) Transformer-based protein generation with regularized latent space optimization. Nature Machine Intelligence 4(10):840–851 Cereto-Massagué et al (2015) Cereto-Massagué A, Ojeda MJ, Valls C, et al (2015) Molecular fingerprint similarity search in virtual screening. Methods 71:58–63 Chen and Jung (2021) Chen S, Jung Y (2021) Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au 1(10):1612–1620 Coley et al (2017) Coley CW, Rogers L, Green WH, et al (2017) Computer-assisted retrosynthesis based on molecular similarity. ACS central science 3(12):1237–1245 Coley et al (2019) Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Cereto-Massagué A, Ojeda MJ, Valls C, et al (2015) Molecular fingerprint similarity search in virtual screening. Methods 71:58–63 Chen and Jung (2021) Chen S, Jung Y (2021) Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au 1(10):1612–1620 Coley et al (2017) Coley CW, Rogers L, Green WH, et al (2017) Computer-assisted retrosynthesis based on molecular similarity. ACS central science 3(12):1237–1245 Coley et al (2019) Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Chen S, Jung Y (2021) Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au 1(10):1612–1620 Coley et al (2017) Coley CW, Rogers L, Green WH, et al (2017) Computer-assisted retrosynthesis based on molecular similarity. ACS central science 3(12):1237–1245 Coley et al (2019) Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Coley CW, Rogers L, Green WH, et al (2017) Computer-assisted retrosynthesis based on molecular similarity. ACS central science 3(12):1237–1245 Coley et al (2019) Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  4. Castro E, Godavarthi A, Rubinfien J, et al (2022) Transformer-based protein generation with regularized latent space optimization. Nature Machine Intelligence 4(10):840–851 Cereto-Massagué et al (2015) Cereto-Massagué A, Ojeda MJ, Valls C, et al (2015) Molecular fingerprint similarity search in virtual screening. Methods 71:58–63 Chen and Jung (2021) Chen S, Jung Y (2021) Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au 1(10):1612–1620 Coley et al (2017) Coley CW, Rogers L, Green WH, et al (2017) Computer-assisted retrosynthesis based on molecular similarity. ACS central science 3(12):1237–1245 Coley et al (2019) Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Cereto-Massagué A, Ojeda MJ, Valls C, et al (2015) Molecular fingerprint similarity search in virtual screening. Methods 71:58–63 Chen and Jung (2021) Chen S, Jung Y (2021) Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au 1(10):1612–1620 Coley et al (2017) Coley CW, Rogers L, Green WH, et al (2017) Computer-assisted retrosynthesis based on molecular similarity. ACS central science 3(12):1237–1245 Coley et al (2019) Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Chen S, Jung Y (2021) Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au 1(10):1612–1620 Coley et al (2017) Coley CW, Rogers L, Green WH, et al (2017) Computer-assisted retrosynthesis based on molecular similarity. ACS central science 3(12):1237–1245 Coley et al (2019) Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Coley CW, Rogers L, Green WH, et al (2017) Computer-assisted retrosynthesis based on molecular similarity. ACS central science 3(12):1237–1245 Coley et al (2019) Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  5. Cereto-Massagué A, Ojeda MJ, Valls C, et al (2015) Molecular fingerprint similarity search in virtual screening. Methods 71:58–63 Chen and Jung (2021) Chen S, Jung Y (2021) Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au 1(10):1612–1620 Coley et al (2017) Coley CW, Rogers L, Green WH, et al (2017) Computer-assisted retrosynthesis based on molecular similarity. ACS central science 3(12):1237–1245 Coley et al (2019) Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Chen S, Jung Y (2021) Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au 1(10):1612–1620 Coley et al (2017) Coley CW, Rogers L, Green WH, et al (2017) Computer-assisted retrosynthesis based on molecular similarity. ACS central science 3(12):1237–1245 Coley et al (2019) Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Coley CW, Rogers L, Green WH, et al (2017) Computer-assisted retrosynthesis based on molecular similarity. ACS central science 3(12):1237–1245 Coley et al (2019) Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  6. Chen S, Jung Y (2021) Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au 1(10):1612–1620 Coley et al (2017) Coley CW, Rogers L, Green WH, et al (2017) Computer-assisted retrosynthesis based on molecular similarity. ACS central science 3(12):1237–1245 Coley et al (2019) Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Coley CW, Rogers L, Green WH, et al (2017) Computer-assisted retrosynthesis based on molecular similarity. ACS central science 3(12):1237–1245 Coley et al (2019) Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  7. Coley CW, Rogers L, Green WH, et al (2017) Computer-assisted retrosynthesis based on molecular similarity. ACS central science 3(12):1237–1245 Coley et al (2019) Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  8. Coley CW, Green WH, Jensen KF (2019) Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling 59(6):2529–2537 Corey and Wipke (1969) Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  9. Corey EJ, Wipke WT (1969) Computer-assisted design of complex organic syntheses: Pathways for molecular synthesis can be devised with a computer and equipment for graphical communication. Science 166(3902):178–192 Corey et al (1985) Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  10. Corey EJ, Long AK, Rubenstein SD (1985) Computer-assisted analysis in organic synthesis. Science 228(4698):408–418 Dai et al (2019) Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  11. Dai H, Li C, Coley C, et al (2019) Retrosynthesis prediction with conditional graph logic network. In: Advances in Neural Information Processing Systems Dubrovskiy et al (2018) Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  12. Dubrovskiy AV, Kesharwani T, Markina NA, et al (2018) Comprehensive Organic Transformations, 4 Volume Set: A Guide to Functional Group Preparations, vol 1 Durant et al (2002) Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  13. Durant JL, Leland BA, Henry DR, et al (2002) Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences 42(6):1273–1280 Gao et al (2023) Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  14. Gao C, Killeen BD, Hu Y, et al (2023) Synthetic data accelerates the development of generalizable learning-based algorithms for x-ray image analysis. Nature Machine Intelligence 5(3):294–308 Hendrickson (1991) Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  15. Hendrickson JB (1991) Concepts and applications of molecular similarity. Science 252(5009):1189–1190 Irwin et al (2020) Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  16. Irwin JJ, Tang KG, Young J, et al (2020) Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60(12):6065–6073 Irwin et al (2022) Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  17. Irwin R, Dimitriadis S, He J, et al (2022) Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015,022 Jin et al (2017) Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  18. Jin W, Coley C, Barzilay R, et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. In: Advances in neural information processing systems Karpov et al (2019) Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  19. Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pp 817–830 Kim et al (2021) Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  20. Kim E, Lee D, Kwon Y, et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61(1):123–133 Kim et al (2019) Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  21. Kim S, Chen J, Cheng T, et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic acids research 47(D1):D1102–D1109 Kingma and Ba (2017) Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  22. Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. In: International conference on machine learning Klein et al (2017) Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  23. Klein G, Kim Y, Deng Y, et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72 Landrum et al (2013) Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  24. Landrum G, et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 Lawson et al (2014) Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  25. Lawson AJ, Swienty-Busch J, Géoui T, et al (2014) The making of reaxys—towards unobstructed access to relevant chemistry information. In: The Future of the History of Chemical Information. p 127–148 Lin et al (2020) Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  26. Lin K, Xu Y, Pei J, et al (2020) Automatic retrosynthetic route planning using template-free models. Chemical science 11(12):3355–3364 Liu et al (2017) Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  27. Liu B, Ramsundar B, Kawthekar P, et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science 3(10):1103–1113 Lowe (2017) Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  28. Lowe D (2017) Chemical reactions from US patents (1976-Sep2016). doi: 10.6084/m9.figshare.5104873.v1 Lowe (2012) Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  29. Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge Maggiora et al (2014) Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  30. Maggiora G, Vogt M, Stumpfe D, et al (2014) Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry 57(8):3186–3204 Marouf et al (2020) Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  31. Marouf M, Machart P, Bansal V, et al (2020) Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications 11(1):166 Mikulak-Klucznik et al (2020) Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  32. Mikulak-Klucznik B, Golkebiowska P, Bayly AA, et al (2020) Computational planning of the synthesis of complex natural products. Nature 588(7836):83–88 Muegge and Mukherjee (2016) Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  33. Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert opinion on drug discovery 11(2):137–148 Nikolova and Jaworska (2003) Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  34. Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22(9-10):1006–1026 Paszke et al (2019) Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  35. Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems Rodrigues (2019) Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  36. Rodrigues T (2019) The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discovery Today: Technologies 32:3–8 Rogers and Hahn (2010) Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  37. Rogers D, Hahn M (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50(5):742–754 Sacha et al (2021) Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  38. Sacha M, Błaz M, Byrski P, et al (2021) Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling 61(7):3273–3284 Schneider et al (2016) Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  39. Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56(12):2336–2346 Schwaller et al (2021) Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  40. Schwaller P, Probst D, Vaucher AC, et al (2021) Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3(2):144–152 Segler and Waller (2017) Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  41. Segler MH, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal 23(25):5966–5971 Segler et al (2018) Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  42. Segler MH, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610 Seo et al (2021) Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  43. Seo SW, Song YY, Yang JY, et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539 Shi et al (2020) Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  44. Shi C, Xu M, Guo H, et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International conference on machine learning, pp 8818–8827 Somnath et al (2021) Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  45. Somnath VR, Bunne C, Coley C, et al (2021) Learning graph models for retrosynthesis prediction. In: Advances in Neural Information Processing Systems, pp 9405–9415 Srivastava et al (2014) Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  46. Srivastava N, Hinton G, Krizhevsky A, et al (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958 Sun et al (2021) Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  47. Sun R, Dai H, Li L, et al (2021) Towards understanding retrosynthesis by energy-based models. In: Advances in Neural Information Processing Systems, pp 10,186–10,194 Tetko et al (2020) Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  48. Tetko IV, Karpov P, Van Deursen R, et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nature communications 11(1):5575 Toniato et al (2021) Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  49. Toniato A, Schwaller P, Cardinale A, et al (2021) Unassisted noise reduction of chemical reaction datasets. Nature Machine Intelligence 3(6):485–494 Tu and Coley (2022) Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  50. Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62(15):3503–3513 Ucak et al (2022) Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  51. Ucak UV, Ashyrmamatov I, Ko J, et al (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13(1):1186 Vaswani et al (2017) Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  52. Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems Wan et al (2022) Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  53. Wan Y, Hsieh CY, Liao B, et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, pp 22,475–22,490 Wang et al (2021) Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  54. Wang X, Li Y, Qiu J, et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal 420:129,845 Weininger (1988) Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  55. Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28(1):31–36 Willett et al (1998) Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  56. Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. Journal of chemical information and computer sciences 38(6):983–996 Yan et al (2020) Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  57. Yan C, Ding Q, Zhao P, et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. In: Advances in Neural Information Processing Systems, pp 11,248–11,258 Yang et al (2022) Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  58. Yang H, Li J, Lim KZ, et al (2022) Automatic strain sensor design via active learning and data augmentation for soft machines. Nature Machine Intelligence 4(1):84–94 Yu et al (2023) Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  59. Yu T, Boob AG, Volk MJ, et al (2023) Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 6(2):137–151 Zhong et al (2023) Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  60. Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nature Communications 14(1):3009 Zhong et al (2022) Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034 Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034
  61. Zhong Z, Song J, Feng Z, et al (2022) Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science 13(31):9023–9034

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.