A Model to Search for Synthesizable Molecules (1906.05221v2)

Published 12 Jun 2019 in cs.LG, physics.comp-ph, and stat.ML

Abstract: Deep generative models are able to suggest new organic molecules by generating strings, trees, and graphs representing their structure. While such models allow one to generate molecules with desirable properties, they give no guarantees that the molecules can actually be synthesized in practice. We propose a new molecule generation model, mirroring a more realistic real-world process, where (a) reactants are selected, and (b) combined to form more complex molecules. More specifically, our generative model proposes a bag of initial reactants (selected from a pool of commercially-available molecules) and uses a reaction model to predict how they react together to generate new molecules. We first show that the model can generate diverse, valid and unique molecules due to the useful inductive biases of modeling reactions. Furthermore, our model allows chemists to interrogate not only the properties of the generated molecules but also the feasibility of the synthesis routes. We conclude by using our model to solve retrosynthesis problems, predicting a set of reactants that can produce a target product.

Citations (104)

View on Semantic Scholar

Summary

The paper introduces a novel generative model that integrates reactant selection with reaction prediction to generate synthesizable molecules.
It employs comprehensive evaluation metrics, including 99.05% validity and 89.11% novelty, outperforming baseline models.
The model incorporates a property predictor to optimize drug-likeness, offering practical insights to accelerate molecular discovery.

Analysis of "A Model to Search for Synthesizable Molecules"

This paper presents a novel approach to the generative modeling of molecules, focusing on synthesizability. The proposed model mimics real-world chemical processes more accurately than previous machine learning methods by encoding both the selection of reactants and their transformation into product molecules. This is a crucial advancement in the domain of molecular generation where the ultimate utility of a molecule is contingent on whether it can be synthesized in practice.

Key Contributions and Methodology

The primary contribution of this paper is the integration of molecule generation with synthesis predictability. The authors introduce a generative model that selects a bag of initial reactants and utilizes a reaction model to predict the resultant product molecules. This model seeks to address a significant challenge in previous molecule generative approaches: the lack of guidance on the synthesizability of predicted molecules.

Generative Model Structure: The model operates in two stages:
- Selection of Reactants: It first selects a bag of reactant molecules from a pool of commercially-available substances, effectively emulating the decision-making process of a chemist.
- Reaction Prediction: A reaction model forecasts the outcome of these reactants interacting, generating complex product molecules.
Evaluation Metrics: The authors evaluate the model using metrics such as validity, uniqueness, novelty, and Fréchet ChemNet Distance (FCD). This comprehensive evaluation allows for a thorough understanding of the model's performance in creating diverse and semantically valid molecules.
Property Optimization: The authors include a property predictor within the model to optimize molecules toward desirable attributes, such as drug-likeness, using Quantitative Estimate of Drug-likeness (QED) scores.

Results and Implications

The model demonstrates high validity (99.05%) and novelty (89.11%) scores, outperforming several baseline models in generating valid and novel molecules. Moreover, the ability to generate chemically stable and synthetically feasible molecules addresses a significant gap in molecular generation, where previously synthesized molecules often lacked practical application due to synthesis intractability.

The paper introduces a promising direction for integrating machine learning with traditional chemistry, potentially transforming the drug discovery pipeline. The ability to predict synthesis routes alongside generating molecular structures can aid in accelerating experimental processes, reducing the time and cost of drug development. Notably, by suggesting feasible synthesis routes, the model holds the potential to shift focus from merely theoretical molecule design to practical molecular discovery.

Future Developments

Future work could explore expanding the model's vocabulary of reactants and extending the reaction prediction to multi-step synthesis pathways. This would enable the creation of a broader range of molecules, crucial for complex pharmaceutical applications. Also, incorporating more detailed data on reaction conditions and side products could refine the predictive accuracy and practical relevance of the model.

Overall, the paper provides a substantial contribution to the intersection of machine learning and chemistry, proposing a model that not only suggests molecules but also elucidates their pathways of synthesis. This holistic approach can serve as a foundation for future innovations in synthesizable molecule generation, potentially enhancing the practicality of AI-driven methodologies in the chemical and pharmaceutical industries.

PDF Markdown

Related Papers

GitHub

GitHub - john-bradshaw/molecule-chef: Code for our paper "A Model to Search for Synthesizable Molecules" (https://arxiv.org/abs/1906.05221) (79 stars)