- The paper demonstrates that ReaSyn's Chain-of-Reaction (CoR) notation significantly improves synthesizability reconstruction by providing explicit stepwise supervision.
- The framework uses a Transformer-based encoder-decoder and RL finetuning with GRPO to enhance pathway diversity and achieve superior reconstruction rates, e.g., 41.2% on ZINC250k.
- The modular design and explicit reaction encoding enable effective goal-directed molecular optimization and hit expansion, offering promising advances for AI-driven drug discovery.
Synthesizable Molecular Design via Chain-of-Reaction Reasoning: The ReaSyn Framework
The generation of synthetically accessible molecules remains a central challenge in computational drug discovery. Existing molecular generative models frequently produce candidates that are not synthesizable under practical laboratory constraints, primarily due to the neglect of synthesizability during multi-objective optimization. Heuristic synthesizability scores and design space constraints have been proposed, but these approaches either fail to capture the non-linear nature of synthesizability or sacrifice explorability and optimization performance. Synthesizable projection—generating synthetic pathways to analogs that are structurally similar and synthesizable—offers a modular solution, but prior methods have not fully leveraged the chemical information embedded in synthetic pathways, resulting in limited coverage and poor optimization.
The ReaSyn Framework and Chain-of-Reaction Notation
ReaSyn introduces a generative framework for synthesizable projection, leveraging a Transformer-based encoder-decoder architecture. The key innovation is the Chain-of-Reaction (CoR) notation, which represents synthetic pathways as explicit sequences of reactants, reaction types, and intermediate products, analogous to chain-of-thought (CoT) reasoning in LLMs. This notation enables dense supervision at every reaction step, facilitating explicit learning of chemical reaction rules and stepwise reasoning during both training and inference.
Figure 1: The ReaSyn framework utilizes an encoder-decoder Transformer to generate synthetic pathways in CoR notation, with intermediate supervision and reasoning via a reaction executor.
The CoR notation contrasts with the previously used postfix notation, which relies on hierarchical classification and molecular fingerprints, leading to information loss and error accumulation. By representing molecules directly in SMILES and including intermediates, CoR provides a unified token vocabulary and removes the need for hierarchical prediction, improving both expressivity and robustness.
Figure 2: Comparison of postfix and CoR notations; CoR explicitly encodes intermediates and reactions, enabling stepwise reasoning and eliminating hierarchical prediction.
Training Paradigm: Supervised and RL Finetuning
ReaSyn employs a two-stage training protocol:
Inference and Test-Time Compute Scaling
During inference, ReaSyn maintains a stack for each pathway, executing reactions and retrieving building blocks via nearest-neighbor search in the SMILES space. Beam search is employed to explore multiple candidate pathways, with scoring functions tailored to the task (reconstruction, optimization, or hit expansion). For goal-directed tasks, a reward model (e.g., a neural property predictor) guides the search, enabling test-time compute scaling analogous to best-of-N sampling and process reward models in LLM reasoning.
Experimental Results
Synthesizable Molecule Reconstruction
ReaSyn demonstrates superior reconstruction rates and pathway diversity compared to SynNet and SynFormer across Enamine, ChEMBL, and ZINC250k test sets. Notably, on the challenging ZINC250k set with unseen building blocks, ReaSyn achieves a reconstruction rate of 41.2%, substantially outperforming baselines. The model also exhibits high diversity in both pathways and building blocks, indicating robust explorability.
Goal-Directed Molecular Optimization
In goal-directed optimization tasks (TDC oracles, sEH binding affinity), ReaSyn integrated with Graph GA consistently outperforms synthesis-constrained baselines in both optimization score and synthetic accessibility (SA). For sEH, ReaSyn achieves a binding affinity of 0.97 and an SA score of 2.01, surpassing all prior methods and demonstrating high sampling efficiency.
Synthesizable Hit Expansion
For JNK3 hit expansion, ReaSyn achieves an analog rate of 50.0%, an improve rate of 13.1%, and a success rate of 11.3%, outperforming previous synthesizable projection methods. The distribution of generated analogs shows that ReaSyn can simultaneously optimize for similarity and target property.

Figure 4: Distribution of JNK3 scores and analog similarity for SynFormer and ReaSyn, illustrating improved hit expansion performance.
Figure 5: Examples of hit molecules and generated synthesizable analogs by ReaSyn in JNK3 hit expansion, with inhibition scores and similarity metrics.
Pathway Generation Examples
ReaSyn generates diverse synthetic pathways for molecule reconstruction, as illustrated in the Enamine examples.
Figure 6: Examples of synthetic pathways generated by ReaSyn in synthesizable molecule reconstruction of Enamine molecules.
Ablation and Comparative Analysis
Ablation studies confirm that both RL finetuning and the inclusion of intermediate product tokens in CoR notation are critical for pathway diversity and reconstruction performance. The use of SMILES over molecular fingerprints is essential for sequence length and token balance. Comparative analysis with retrosynthesis planning methods (e.g., AiZynthFinder) shows that ReaSyn achieves higher reconstruction rates on Enamine and ZINC250k, despite a smaller design space, and offers broader applicability beyond retrosynthesis.
Implications and Future Directions
ReaSyn's explicit reasoning over synthetic pathways, enabled by CoR notation and RL finetuning, sets a new standard for synthesizable molecular design. The framework's modularity allows integration with various generative and optimization algorithms, facilitating practical deployment in drug discovery pipelines. The approach demonstrates that stepwise reasoning and intermediate supervision are crucial for navigating combinatorially large chemical spaces.
Potential future developments include the incorporation of additional reaction metadata (e.g., reagents, yields), toxicity filtering, and further scaling of reward models for property optimization. The analogy to LLM reasoning suggests that advances in process reward modeling and search algorithms can be directly transferred to molecular design, opening avenues for more sophisticated reasoning and exploration strategies.
Conclusion
ReaSyn introduces a principled framework for synthesizable molecular projection, leveraging chain-of-reaction reasoning to achieve high coverage, diversity, and optimization performance in the synthesizable chemical space. The explicit encoding of reaction steps and intermediates, combined with RL finetuning and goal-directed search, enables ReaSyn to outperform prior methods in reconstruction, optimization, and hit expansion tasks. The framework's design and empirical results underscore the importance of stepwise reasoning and dense supervision in generative molecular modeling, with significant implications for the future of AI-driven drug discovery.