Re-evaluating Retrosynthesis Algorithms with Syntheseus (2310.19796v3)

Published 30 Oct 2023 in cs.LG, cs.AI, and q-bio.QM

Abstract: Automated Synthesis Planning has recently re-emerged as a research area at the intersection of chemistry and machine learning. Despite the appearance of steady progress, we argue that imperfect benchmarks and inconsistent comparisons mask systematic shortcomings of existing techniques, and unnecessarily hamper progress. To remedy this, we present a synthesis planning library with an extensive benchmarking framework, called syntheseus, which promotes best practice by default, enabling consistent meaningful evaluation of single-step models and multi-step planning algorithms. We demonstrate the capabilities of syntheseus by re-evaluating several previous retrosynthesis algorithms, and find that the ranking of state-of-the-art models changes in controlled evaluation experiments. We end with guidance for future works in this area, and call the community to engage in the discussion on how to improve benchmarks for synthesis planning.

Citations (8)

View on Semantic Scholar

Summary

The paper presents a critical analysis of retrosynthesis algorithms, highlighting deficiencies in current benchmarking practices that rely too heavily on recall metrics.
The authors introduce Syntheseus, a Python-based library designed to standardize evaluation of single-step and multi-step retrosynthesis models by addressing post-processing inconsistencies.
The study demonstrates that uniform metrics focusing on efficiency and route diversity can yield a more accurate assessment of model performance and bridge the gap between in-silico predictions and practical synthesis.

An Expert Overview of "Re-evaluating Retrosynthesis Algorithms with Syntheseus"

The paper "Re-evaluating Retrosynthesis Algorithms with Syntheseus" presents a critical analysis and re-evaluation of retrosynthesis algorithms, emphasizing the necessity of robust benchmarking practices. Retrosynthesis plays a pivotal role in computational chemistry, facilitating the design of synthetic routes for molecular compounds. The authors critique the current methodologies, impacted by inconsistent benchmarks and comparison paradigms, which undermine the evaluation of retrosynthesis algorithms. To address these issues, a Python-based benchmarking library—syntheseus—is introduced, aimed at ensuring cohesive and standardized evaluation practices.

The paper scrutinizes the traditional approaches towards retrosynthesis evaluation, highlighting several deficiencies such as the focus on recall rather than precision, which can lead to an overestimation of model performance. In single-step retrosynthesis, one of the primary pitfalls identified is the inconsistent post-processing of model outputs. The paper asserts the importance of uniform evaluation metrics that align with the computational synthesis program objectives and proposes the necessity to incorporate post-processing elements like removing invalid outputs and duplicates.

In discussing multi-step retrosynthesis, the paper emphasizes the drawbacks of utilizing success rate as a comparison metric. The success rate, while significant, often overlooks the quality of found routes, hence, it cannot be solely relied upon to rank single-step models. The authors advocate for the need to measure the efficiency and diversity of proposed routes, suggesting metrics like the number of non-overlapping routes which can provide a more nuanced perspective of an algorithm’s effectiveness.

The syntheseus library is proposed as a remedy to these challenges. It provides a standardized platform to evaluate and compare single-step and multi-step retrosynthesis models, ensuring consistency in inputs, outputs, and post-processing, which addresses many pitfalls previously highlighted. The library supports a modular design that allows easy integration of various models and search algorithms, helping researchers to evaluate not just quantitative accuracy but also inference speed, which is crucial for real-world applications.

In their experiments, the authors utilize syntheseus to re-evaluate numerous existing single-step retrosynthesis models such as Chemformer, GLN, and RetroKNN, identifying discrepancies in previously reported performance due to inconsistent benchmark practices. They extend this evaluation to assess the models’ generalization capabilities using the proprietary Pistachio dataset, which offers insights into how these models perform beyond the confines of commonly used benchmarks like USPTO-50K.

The paper further explores the performance of single-step models within multi-step search algorithms, combining them with frameworks like Monte Carlo Tree Search (MCTS) and RetroStar to establish a holistic view of existing methods. These experiments reveal not only the specific capabilities of individual algorithms but highlight the potential improvements possible when using a standardized evaluation framework.

Syntheseus stands as a promising tool for the evolution of retrosynthesis, potentially guiding future developments in computational chemistry. However, the paper also acknowledges remaining challenges, such as the need for models that can evaluate the feasibility of reactions, bridging the gap between in-silico predictions and practical lab synthesis.

Overall, "Re-evaluating Retrosynthesis Algorithms with Syntheseus" encapsulates a significant contribution to retrosynthesis research by providing a robust framework addressing the current evaluation challenges. By fostering a culture of standardized and rigorous benchmarking, syntheseus can propel future advancements, ensuring that retrosynthesis algorithms are evaluated fairly and comprehensively, thereby catalyzing their adoption in real-world synthetic chemistry applications.

PDF Markdown

Related Papers

Tweets

https://twitter.com/MaziarzKris/status/1788579673892430160

https://twitter.com/_portal_/status/1768698612261752964

https://twitter.com/XTXI/status/1759933861562122402

https://twitter.com/AI_inAM/status/1759824965724164570

YouTube

Show All Videos