Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 86 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 19 tok/s Pro
GPT-5 High 25 tok/s Pro
GPT-4o 84 tok/s Pro
Kimi K2 129 tok/s Pro
GPT OSS 120B 430 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

Rethinking Molecule Synthesizability with Chain-of-Reaction (2509.16084v1)

Published 19 Sep 2025 in cs.LG

Abstract: A well-known pitfall of molecular generative models is that they are not guaranteed to generate synthesizable molecules. There have been considerable attempts to address this problem, but given the exponentially large combinatorial space of synthesizable molecules, existing methods have shown limited coverage of the space and poor molecular optimization performance. To tackle these problems, we introduce ReaSyn, a generative framework for synthesizable projection where the model explores the neighborhood of given molecules in the synthesizable space by generating pathways that result in synthesizable analogs. To fully utilize the chemical knowledge contained in the synthetic pathways, we propose a novel perspective that views synthetic pathways akin to reasoning paths in LLMs. Specifically, inspired by chain-of-thought (CoT) reasoning in LLMs, we introduce the chain-of-reaction (CoR) notation that explicitly states reactants, reaction types, and intermediate products for each step in a pathway. With the CoR notation, ReaSyn can get dense supervision in every reaction step to explicitly learn chemical reaction rules during supervised training and perform step-by-step reasoning. In addition, to further enhance the reasoning capability of ReaSyn, we propose reinforcement learning (RL)-based finetuning and goal-directed test-time compute scaling tailored for synthesizable projection. ReaSyn achieves the highest reconstruction rate and pathway diversity in synthesizable molecule reconstruction and the highest optimization performance in synthesizable goal-directed molecular optimization, and significantly outperforms previous synthesizable projection methods in synthesizable hit expansion. These results highlight ReaSyn's superior ability to navigate combinatorially-large synthesizable chemical space.

Summary

  • The paper demonstrates that ReaSyn's Chain-of-Reaction (CoR) notation significantly improves synthesizability reconstruction by providing explicit stepwise supervision.
  • The framework uses a Transformer-based encoder-decoder and RL finetuning with GRPO to enhance pathway diversity and achieve superior reconstruction rates, e.g., 41.2% on ZINC250k.
  • The modular design and explicit reaction encoding enable effective goal-directed molecular optimization and hit expansion, offering promising advances for AI-driven drug discovery.

Synthesizable Molecular Design via Chain-of-Reaction Reasoning: The ReaSyn Framework

Motivation and Problem Formulation

The generation of synthetically accessible molecules remains a central challenge in computational drug discovery. Existing molecular generative models frequently produce candidates that are not synthesizable under practical laboratory constraints, primarily due to the neglect of synthesizability during multi-objective optimization. Heuristic synthesizability scores and design space constraints have been proposed, but these approaches either fail to capture the non-linear nature of synthesizability or sacrifice explorability and optimization performance. Synthesizable projection—generating synthetic pathways to analogs that are structurally similar and synthesizable—offers a modular solution, but prior methods have not fully leveraged the chemical information embedded in synthetic pathways, resulting in limited coverage and poor optimization.

The ReaSyn Framework and Chain-of-Reaction Notation

ReaSyn introduces a generative framework for synthesizable projection, leveraging a Transformer-based encoder-decoder architecture. The key innovation is the Chain-of-Reaction (CoR) notation, which represents synthetic pathways as explicit sequences of reactants, reaction types, and intermediate products, analogous to chain-of-thought (CoT) reasoning in LLMs. This notation enables dense supervision at every reaction step, facilitating explicit learning of chemical reaction rules and stepwise reasoning during both training and inference. Figure 1

Figure 1: The ReaSyn framework utilizes an encoder-decoder Transformer to generate synthetic pathways in CoR notation, with intermediate supervision and reasoning via a reaction executor.

The CoR notation contrasts with the previously used postfix notation, which relies on hierarchical classification and molecular fingerprints, leading to information loss and error accumulation. By representing molecules directly in SMILES and including intermediates, CoR provides a unified token vocabulary and removes the need for hierarchical prediction, improving both expressivity and robustness. Figure 2

Figure 2: Comparison of postfix and CoR notations; CoR explicitly encodes intermediates and reactions, enabling stepwise reasoning and eliminating hierarchical prediction.

Training Paradigm: Supervised and RL Finetuning

ReaSyn employs a two-stage training protocol:

  • Supervised Learning: The model is trained on (target molecule, pathway) pairs using next-token prediction, with loss weighting to balance SMILES and non-SMILES tokens. The inclusion of intermediate products in CoR notation provides denser supervision, enhancing the model's ability to learn reaction rules.
  • RL Finetuning: To further improve reasoning and pathway diversity, ReaSyn is finetuned using Group Relative Policy Optimization (GRPO), a variant of PPO, with outcome-based rewards defined by molecular similarity between generated products and targets. The KL regularization term constrains deviation from the supervised model, and only multi-step pathways are used to promote diversity. Figure 3

    Figure 3: RL finetuning of ReaSyn using the GRPO algorithm, optimizing for pathway diversity and outcome similarity.

Inference and Test-Time Compute Scaling

During inference, ReaSyn maintains a stack for each pathway, executing reactions and retrieving building blocks via nearest-neighbor search in the SMILES space. Beam search is employed to explore multiple candidate pathways, with scoring functions tailored to the task (reconstruction, optimization, or hit expansion). For goal-directed tasks, a reward model (e.g., a neural property predictor) guides the search, enabling test-time compute scaling analogous to best-of-N sampling and process reward models in LLM reasoning.

Experimental Results

Synthesizable Molecule Reconstruction

ReaSyn demonstrates superior reconstruction rates and pathway diversity compared to SynNet and SynFormer across Enamine, ChEMBL, and ZINC250k test sets. Notably, on the challenging ZINC250k set with unseen building blocks, ReaSyn achieves a reconstruction rate of 41.2%, substantially outperforming baselines. The model also exhibits high diversity in both pathways and building blocks, indicating robust explorability.

Goal-Directed Molecular Optimization

In goal-directed optimization tasks (TDC oracles, sEH binding affinity), ReaSyn integrated with Graph GA consistently outperforms synthesis-constrained baselines in both optimization score and synthetic accessibility (SA). For sEH, ReaSyn achieves a binding affinity of 0.97 and an SA score of 2.01, surpassing all prior methods and demonstrating high sampling efficiency.

Synthesizable Hit Expansion

For JNK3 hit expansion, ReaSyn achieves an analog rate of 50.0%, an improve rate of 13.1%, and a success rate of 11.3%, outperforming previous synthesizable projection methods. The distribution of generated analogs shows that ReaSyn can simultaneously optimize for similarity and target property. Figure 4

Figure 4

Figure 4: Distribution of JNK3 scores and analog similarity for SynFormer and ReaSyn, illustrating improved hit expansion performance.

Figure 5

Figure 5: Examples of hit molecules and generated synthesizable analogs by ReaSyn in JNK3 hit expansion, with inhibition scores and similarity metrics.

Pathway Generation Examples

ReaSyn generates diverse synthetic pathways for molecule reconstruction, as illustrated in the Enamine examples. Figure 6

Figure 6: Examples of synthetic pathways generated by ReaSyn in synthesizable molecule reconstruction of Enamine molecules.

Ablation and Comparative Analysis

Ablation studies confirm that both RL finetuning and the inclusion of intermediate product tokens in CoR notation are critical for pathway diversity and reconstruction performance. The use of SMILES over molecular fingerprints is essential for sequence length and token balance. Comparative analysis with retrosynthesis planning methods (e.g., AiZynthFinder) shows that ReaSyn achieves higher reconstruction rates on Enamine and ZINC250k, despite a smaller design space, and offers broader applicability beyond retrosynthesis.

Implications and Future Directions

ReaSyn's explicit reasoning over synthetic pathways, enabled by CoR notation and RL finetuning, sets a new standard for synthesizable molecular design. The framework's modularity allows integration with various generative and optimization algorithms, facilitating practical deployment in drug discovery pipelines. The approach demonstrates that stepwise reasoning and intermediate supervision are crucial for navigating combinatorially large chemical spaces.

Potential future developments include the incorporation of additional reaction metadata (e.g., reagents, yields), toxicity filtering, and further scaling of reward models for property optimization. The analogy to LLM reasoning suggests that advances in process reward modeling and search algorithms can be directly transferred to molecular design, opening avenues for more sophisticated reasoning and exploration strategies.

Conclusion

ReaSyn introduces a principled framework for synthesizable molecular projection, leveraging chain-of-reaction reasoning to achieve high coverage, diversity, and optimization performance in the synthesizable chemical space. The explicit encoding of reaction steps and intermediates, combined with RL finetuning and goal-directed search, enables ReaSyn to outperform prior methods in reconstruction, optimization, and hit expansion tasks. The framework's design and empirical results underscore the importance of stepwise reasoning and dense supervision in generative molecular modeling, with significant implications for the future of AI-driven drug discovery.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 4 posts and received 71 likes.