Retro: Learning Retrosynthetic Planning with Neural Guided A Search (2006.15820v1)

Published 29 Jun 2020 in cs.LG, cs.AI, and stat.ML

Abstract: Retrosynthetic planning is a critical task in organic chemistry which identifies a series of reactions that can lead to the synthesis of a target product. The vast number of possible chemical transformations makes the size of the search space very big, and retrosynthetic planning is challenging even for experienced chemists. However, existing methods either require expensive return estimation by rollout with high variance, or optimize for search speed rather than the quality. In this paper, we propose Retro*, a neural-based A*-like algorithm that finds high-quality synthetic routes efficiently. It maintains the search as an AND-OR tree, and learns a neural search bias with off-policy data. Then guided by this neural network, it performs best-first search efficiently during new planning episodes. Experiments on benchmark USPTO datasets show that, our proposed method outperforms existing state-of-the-art with respect to both the success rate and solution quality, while being more efficient at the same time.

Citations (90)

View on Semantic Scholar

Summary

The paper presents Retro*, a novel algorithm that integrates neural guidance with A* search to improve retrosynthetic planning.
It employs an AND-OR tree framework and cost estimation via a neural network to navigate complex chemical reaction spaces efficiently.
Experimental results on USPTO data show Retro* achieves higher synthesis success rates and more optimal routes compared to existing methods.

Learning Retrosynthetic Planning with Neural Guided A* Search

The paper "Retro*: Learning Retrosynthetic Planning with Neural Guided A* Search" presents a comprehensive approach to addressing the challenges inherent in retrosynthetic planning within organic chemistry. Retrosynthetic planning aims to determine a sequence of reactions capable of synthesizing a target molecule from available starting materials. This process is notoriously complex due to the vast number of potential reaction pathways that must be considered, often making the planning process labor-intensive even for skilled chemists.

Methodology

The authors introduce Retro*, a planning algorithm that employs a neural-guided tree search strategy that resembles the A* search algorithm structure. Retro* is designed to not only enhance the efficiency of retrosynthetic planning but also to improve the quality of the synthetic paths identified. This is achieved through the integration of a neural network that guides the search process. The primary innovations and components of Retro* are as follows:

AND-OR Tree Representation: Retro* represents the search space as an AND-OR tree. Here, AND nodes correspond to chemical reactions, which require all their child nodes (reactants) to be satisfied to consider the reaction complete. OR nodes, representing molecules, require only one of their child nodes (reactions) to be true to satisfy the node. This allows the planner to utilize subproblem structures inherent in retrosynthetic planning.
Neural-Guided Search: The algorithm uses a neural network to estimate the cost of synthesizing potential molecules, thus guiding the search to more promising pathways. This is in stark contrast with traditional approaches that focus exclusively on speed without regard to solution quality or rely on rollout estimation, which can be computationally costly and yield high variance.
Learning from Historical Data: By leveraging past retrosynthetic planning data, the neural network can learn to predict the costs associated with synthesizing particular molecules more accurately, thereby refining the guidance provided during the search process.

Experimental Results

Empirical evaluations were conducted on benchmark datasets from the United States Patent and Trademark Office (USPTO) database, known for its extensive collection of chemical reaction data. The experiments demonstrate that Retro* outperforms state-of-the-art algorithms on two key metrics: the success rate of synthesizing a target molecule and the quality of the synthesized routes, indicated by the total computational cost and number of reactions.

Success Rate: Retro* achieved a higher success rate compared to other algorithms, significantly surpassing methods such as Monte Carlo Tree Search and DFPN-E.
Route Quality: Retro* identified routes that were shorter and less costly than those produced by other methods, attesting to the effectiveness of its neural-guided mechanism.

Implications and Future Directions

The introduction of Retro* represents a considerable advancement in automating retrosynthetic planning by combining traditional search algorithms with neural networks' predictive capabilities. The approach highlights several practical implications:

Enhanced Efficiency: The improved success rate and route quality can expedite the drug discovery process, where rapid and accurate synthesis of target compounds is crucial.
Scalable Solutions: The framework provided by Retro* can be applied to larger sets of reactions, continuously learning and refining its predictions as more data become available.
Broader Applicability: Although focused on chemistry's retrosynthetic planning, the architecture and learning strategies of Retro* suggest potential applicability to other domains requiring hierarchical planning, such as task scheduling and automated theorem proving.

In conclusion, Retro* merges machine learning and traditional planning techniques, showcasing a significant step toward more capable and efficient retrosynthetic planning systems. Future research may explore extending Retro*'s capabilities to incorporate more complex chemical constraints and reactions, further improving its utility in practical and theoretical applications in chemistry and beyond.

PDF Markdown

Related Papers

YouTube

Show All Videos

Retro*: Learning Retrosynthetic Planning with Neural Guided A* Search (2006.15820v1)