Learning to Plan Chemical Syntheses (1708.04202v1)

Published 14 Aug 2017 in cs.AI, cs.LG, and physics.chem-ph

Abstract: From medicines to materials, small organic molecules are indispensable for human well-being. To plan their syntheses, chemists employ a problem solving technique called retrosynthesis. In retrosynthesis, target molecules are recursively transformed into increasingly simpler precursor compounds until a set of readily available starting materials is obtained. Computer-aided retrosynthesis would be a highly valuable tool, however, past approaches were slow and provided results of unsatisfactory quality. Here, we employ Monte Carlo Tree Search (MCTS) to efficiently discover retrosynthetic routes. MCTS was combined with an expansion policy network that guides the search, and an "in-scope" filter network to pre-select the most promising retrosynthetic steps. These deep neural networks were trained on 12 million reactions, which represents essentially all reactions ever published in organic chemistry. Our system solves almost twice as many molecules and is 30 times faster in comparison to the traditional search method based on extracted rules and hand-coded heuristics. Finally after a 60 year history of computer-aided synthesis planning, chemists can no longer distinguish between routes generated by a computer system and real routes taken from the scientific literature. We anticipate that our method will accelerate drug and materials discovery by assisting chemists to plan better syntheses faster, and by enabling fully automated robot synthesis.

Citations (1,292)

View on Semantic Scholar

Summary

The paper presents 3N-MCTS, a method that combines Monte Carlo Tree Search with three deep neural networks to efficiently perform retrosynthetic analysis of small organic molecules.
The paper demonstrates improved performance by nearly doubling solved molecules (95.24% success) and achieving synthesis planning 30 times faster than previous systems.
The paper validates its approach through AB tests with 45 postgraduate chemists, who significantly favored 3N-MCTS over traditional heuristic methods.

Learning to Plan Chemical Syntheses

The authors present an advanced methodology for planning chemical syntheses through a combination of Monte Carlo Tree Search (MCTS) and deep neural networks (DNNs). This approach aims to address the limitations of traditional computer-assisted synthesis planning (CASP) systems, which have long been criticized for being slow and producing suboptimal results.

The central problem tackled by this paper involves the retrosynthetic analysis of small organic molecules, a cornerstone technique in organic chemistry. Retrosynthesis entails breaking down target molecules into simpler precursor compounds through a recursive process until commercially available starting materials are identified. Traditional methods, relying heavily on hand-coded heuristics and rules, have typically fallen short in efficiency and reliability.

In their proposed method, the authors integrate three essential DNNs into the MCTS framework, collectively termed 3N-MCTS. The first network, the expansion policy network, is trained on roughly 12 million reactions from a comprehensive database, enabling it to guide the MCTS by proposing probable transformations. The second network, the in-scope filter, predicts whether proposed reactions would be feasible, essentially serving as a rapid filter to discard improbable reactions. Lastly, the rollout policy network is utilized during the simulation phase to perform random searches through the solution space.

Key Technical Contributions

Efficiency and Coverage: The system is capable of referencing transformation rules derived from a database spanning the history of organic chemistry, allowing it to cover 79% of chemical reactions that occurred post-2014.
Performance Metrics: The authors report a substantial improvement in performance metrics. Their method solves almost double the number of molecules compared to previous systems, and does so 30 times faster. Specifically, 3N-MCTS achieves a success rate for solving test molecules of 95.24%, significantly outperforming heuristic BFS (55.53%) and neural BFS (84.24%).
Real-World Validation: The success of their approach is underscored by AB tests conducted with 45 postgraduate chemists. These chemists could not reliably distinguish between syntheses generated by 3N-MCTS and those from the scientific literature. In fact, they significantly favored 3N-MCTS over traditional heuristic methods (68.2% vs. 31.8%).

Methodological Insights

The paper delineates several critical aspects of the methodology:

Monte Carlo Tree Search: By iterating over phases of selection, expansion, rollout, and update, the MCTS framework dynamically constructs the search tree, allowing the system to focus computational efforts on the most promising pathways.
Neural Network Integration: Utilizing DNNs for policy guidance and feasibility filtering injects a powerful predictive element into the retrosynthesis process, allowing for more intelligent navigation through the vast search space of potential transformations.
Automatic Rule Extraction: Extraction of transformation rules from extensive reaction databases facilitates the creation of a robust rule set without the need for manual curation. This addresses the historically significant problem of knowledge encoding in CASP.

Implications and Future Directions

The implications of this research are profound across both practical and theoretical dimensions. Practically, the ability to rapidly generate accurate synthesis plans can vastly accelerate the drug discovery process, enabling chemists to explore and validate novel compounds more efficiently. Theoretically, this research opens new avenues for further integration of machine learning techniques in chemistry, particularly in the context of sparse data domains such as natural product synthesis, where methodology development remains challenging.

However, the method does not yet effectively address certain complex chemistries like natural product synthesis or stereochemistry, areas requiring further research and possibly more sophisticated reasoning algorithms.

Conclusion

In summary, the integration of MCTS and neural networks for chemical synthesis planning represents a substantial advancement in the field. This method reduces the time and effort required to develop synthesis routes, enhances the feasibility and reliability of proposed reactions, and holds promise for future applications in automated robotic synthesis and broader material discovery efforts. This work marks a significant step toward making CASP systems an indispensable tool for chemists, leveraging historical data to inform and enhance contemporary research practices.

PDF Markdown