The Synthesizability of Molecules Proposed by Generative Models

Published 17 Feb 2020 in q-bio.QM, cs.LG, and stat.ML | (2002.07007v1)

Abstract: The discovery of functional molecules is an expensive and time-consuming process, exemplified by the rising costs of small molecule therapeutic discovery. One class of techniques of growing interest for early-stage drug discovery is de novo molecular generation and optimization, catalyzed by the development of new deep learning approaches. These techniques can suggest novel molecular structures intended to maximize a multi-objective function, e.g., suitability as a therapeutic against a particular target, without relying on brute-force exploration of a chemical space. However, the utility of these approaches is stymied by ignorance of synthesizability. To highlight the severity of this issue, we use a data-driven computer-aided synthesis planning program to quantify how often molecules proposed by state-of-the-art generative models cannot be readily synthesized. Our analysis demonstrates that there are several tasks for which these models generate unrealistic molecular structures despite performing well on popular quantitative benchmarks. Synthetic complexity heuristics can successfully bias generation toward synthetically-tractable chemical space, although doing so necessarily detracts from the primary objective. This analysis suggests that to improve the utility of these models in real discovery workflows, new algorithm development is warranted.

Abstract PDF Upgrade to Chat

Citations (231)

View on Semantic Scholar

Summary

The paper demonstrates that distribution learning models tend to propose synthesizable molecules when trained on curated datasets with a higher fraction of feasible compounds.
It reveals that goal-directed generation models often produce promising yet non-synthesizable structures, particularly with methods like SMILES GA and Graph GA.
The study shows that incorporating heuristic biases with synthesizability scores improves output viability, though sometimes at the expense of the primary objective function.

Synthesizability of Molecules Proposed by Generative Models

The paper "The Synthesizability of Molecules Proposed by Generative Models" by Wenhao Gao and Connor W. Coley addresses a crucial challenge in the field of cheminformatics and drug discovery: the synthetic feasibility of molecular structures proposed by generative models. The authors provide a comprehensive evaluation of the ability of various state-of-the-art generative algorithms to propose synthesizable molecular compounds, with implications for their integration into drug discovery workflows.

Background and Objective

The discovery of functional molecules, particularly in the pharmaceutical industry, is notoriously costly and time-consuming. Recent advances have seen the emergence of de novo molecular generation and optimization techniques, driven by modern deep learning methodologies. Such techniques are instrumental in exploring vast chemical spaces to discover potential therapeutics with desirable properties, bypassing brute-force methods. However, a prevailing obstacle is the synthesizability of generated molecular candidates; often, promising suggested structures cannot be synthesized using currently available methods or materials.

Methodology and Analysis

The authors employ a data-driven computer-aided synthesis planning tool, ASKCOS, to quantitatively assess the synthesizability of molecules generated by various generative models. The analysis conducted encompasses two categories: distribution learning models, which interpolate within a chemical space described by training data, and goal-directed generation models, which aim to optimize for specific chemical properties or objectives.

Evaluation Metrics

Gao and Coley assess synthesizability using synthetic complexity scores such as the synthetic accessibility score (SA_Score) and the synthetic complexity score (SCScore). These metrics offer heuristic evaluations of a molecule's ease of synthesis, while ASKCOS conducts explicit retrosynthetic analysis to determine viable synthetic pathways.

Key Findings

Distribution Learning Models: Models trained on datasets with a higher fraction of synthesizable compounds (e.g., MOSES) tend to propose molecules that are synthesizable at comparable rates. This suggests that starting with a curated, synthesizable dataset can influence the output positively.
Goal-Directed Generation Models: A significant proportion of high-scoring molecules proposed by these models are not synthesizable. The study reveals that some compounds screened with SMILES GA and Graph GA methodologies are especially problematic, proposing nonsensical structures that, despite scoring well on evaluation metrics, lack practical synthetic pathways.
Heuristic Biasing: The introduction of heuristic biases, particularly through the normalization of the objective function with synthesizability scores like SA_Score, considerably improves the fraction of synthesizable outputs. However, this improvement often comes at the cost of the primary objective function value.

Implications and Future Directions

This work highlights a fundamental consideration in the practical application of generative models in drug discovery: the necessity to balance molecular novelty and optimizable properties with synthesizability. The application of heuristic biases or constraints during the generation phase directly addresses this balance, though at the expense of raw objective optimization.

The authors advocate for further development of generative algorithms that intrinsically account for synthetic feasibility—potentially by integrating explicit synthesis planning methodologies or embedding synthetic constraints within the generation process itself. Future advancements in computational efficiency and synthesis prediction accuracy could also enable real-time incorporation of synthesizability assessments during molecule generation, enhancing their practical utility in drug discovery pipelines.

In conclusion, Gao and Coley's analysis provides a critical perspective on the current limitations and necessary advancements for integrating AI-generated molecular suggestions into reliable, real-world synthetic processes, marking a forward step in the melding of computational algorithms with experimental chemistry.

Markdown

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

The Synthesizability of Molecules Proposed by Generative Models

Summary

Synthesizability of Molecules Proposed by Generative Models

Background and Objective

Methodology and Analysis

Evaluation Metrics

Key Findings

Implications and Future Directions

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Authors (2)

Collections

The Synthesizability of Molecules Proposed by Generative Models

Summary

Synthesizability of Molecules Proposed by Generative Models

Background and Objective

Methodology and Analysis

Evaluation Metrics

Key Findings

Implications and Future Directions

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (2)

Collections