ReaSyn: Synthesizable Molecule Projection
- ReaSyn is a generative modeling framework that explicitly forms stepwise synthetic pathways using chain-of-reaction notation, ensuring molecules are synthesizable.
- It integrates supervised learning with reinforcement learning finetuning to efficiently explore chemical space and optimize molecular properties.
- By tokenizing intermediate reactions and employing dense per-step supervision, ReaSyn overcomes traditional model limitations, achieving high reconstruction rates and pathway diversity.
ReaSyn is a generative modeling framework for synthesizable molecule projection that addresses the limitations of conventional molecular generative models with regard to synthesizability. It is specifically engineered to explore combinatorially-large neighborhoods of input molecules within synthesizable chemical space by generating explicit synthetic pathways—structured as stepwise reaction sequences—which result in synthesizable analogs. The core methodological innovation of ReaSyn is its "chain-of-reaction" (CoR) notation, a formalism inspired by chain-of-thought reasoning in LLMs, which enables the model to perform explicit, stepwise chemical reasoning during both training and inference. Integration of supervised learning, dense per-step supervision, reinforcement learning (RL) finetuning, and compute scaling further allows efficient exploration and optimization for synthesizability and property-based tasks in molecular design (Lee et al., 19 Sep 2025).
1. Motivation and Problem Setting
Traditional molecular generative models frequently propose molecules that are not synthetically accessible, due to the vast combinatorial space and lack of explicit constraints on synthesizability. Existing attempts to mitigate this—such as the use of heuristic synthesizability metrics or hard restrictions on the search space—result in limited coverage and poor optimization performance for molecular properties. ReaSyn distinguishes itself by formulating molecule generation as explicit generation of synthetic pathways: for a given input molecule, it generates a pathway in which each intermediate and reaction is explicitly represented, ensuring that any proposed molecule is accessible via a sequence of known chemical transformations and purchasable building blocks.
A key conceptual advance is the analogy between chain-of-thought (CoT) prompting in LLMs and stepwise chemical synthesis: just as CoT decomposes reasoning into readable steps, chain-of-reaction (CoR) decomposes molecule synthesis into interpretable, verifiable reaction steps.
2. Chain-of-Reaction (CoR) Notation: Formalism and Implementation
The CoR notation expresses a synthetic pathway as a sequence of discrete blocks, with each block corresponding to a single reaction:
- Block Structure: Each block includes the reactant(s) (building blocks or intermediates), a reaction token (indexing one of 115 possible reaction types), and the resulting intermediate product. This unified tokenization employs SMILES strings for molecules, special markers (e.g., [MOL:START], [MOL:END]), and reaction tokens, all from a shared vocabulary.
- Sequencing: A pathway is mathematically represented as:
where denotes sequence concatenation and is the number of reaction steps.
- Supervision: Unlike hierarchical or postfix notations, CoR embeds both the reactant and every intermediate product directly in the sequence, affording dense supervision at every reaction step. This enables the model to acquire explicit knowledge of chemical reaction rules and reduces error propagation compared to end-to-end or “one-shot” approaches.
This notation permits step-level alignment between predicted reaction steps and gold standard synthetic routes, facilitating fine-grained error analysis and interpretability during both training and deployment.
3. Model Architecture and Stepwise Reasoning
ReaSyn implements a Transformer-based encoder–decoder model which autoregressively generates the CoR sequence. Its operation can be summarized as follows:
- Encoding: Given an input molecule, the model encodes the required features using the transformer encoder. The decoder autoregressively generates the CoR-encoded sequence, predicting, at each step, the next token (reactant, reaction type, or intermediate).
- Stepwise Reasoning: The autoregressive process, coupled with CoR notation, enables explicit “reasoning” through each reaction. Unlike standard generative models that synthesize the molecule holistically, ReaSyn traverses the chemical space incrementally, evaluating the validity of each transformation.
- Learning Reaction Rules: The explicit recording of intermediates allows the model to closely learn the mapping from input reactants and chosen reaction to output product at each step, which is analogous to learning mechanistic steps in retrosynthesis.
By integrating intermediate products as tokens, models trained on CoR supervision can generalize not only molecule–path mappings but also the underlying chemical reaction principles—demonstrated empirically by the ability to reconstruct molecules and propose diverse pathways.
4. Reinforcement Learning Enhancement and Goal-Directed Scalability
To augment the model's reasoning and optimization capabilities, ReaSyn is finetuned via reinforcement learning:
- RL Objective: For a target molecule and a generated pathway , the reward is defined as , where “sim” denotes Tanimoto similarity (or another domain-appropriate metric) between the end product of the generated pathway and the target.
- KL Regularization: A Kullback–Leibler (KL) divergence penalty, comparing the RL-optimized model to the supervised baseline, prevents catastrophic drift from chemically valid routes.
- GRPO Algorithm: RL is implemented via Group Relative Policy Optimization (GRPO), designed to encourage exploration across multiple diverse, valid solution paths rather than mode collapse to a single path.
- Compute Scaling: Goal-directed test-time compute scaling is incorporated to dynamically allocate greater inference resources for difficult projection tasks, allowing the model to shift from exploiting known pathways to more intensive exploration when necessary.
These RL and scaling strategies are applied in both reconstructing known molecules (maximizing exact synthesis reconstruction) and goal-directed optimizations (e.g., property-driven design while guaranteeing synthetic accessibility).
5. Evaluation Metrics and Experimental Performance
Performance is assessed on synthesizable molecule reconstruction, hit expansion, and goal-directed optimization tasks, with the following key quantitative metrics:
Metric | Definition/Context |
---|---|
Reconstruction Rate | Proportion of cases with exact pathway yielding the input molecule |
Molecular Similarity | Tanimoto similarity on Morgan fingerprints (with possible augmentation) |
Pathway Diversity | Number/entropy of distinct synthetic pathways generated per molecule |
BB Diversity (Building Block Diversity) | Variety of unique input building blocks proposed across projection tasks |
Optimization Score (e.g., binding affinity) | Measure of molecular optimization on an auxiliary property (e.g., sEH, JNK3) |
Results indicate that ReaSyn achieves a reconstruction rate of approximately 76.8% on the Enamine dataset, outperforming SynFormer (63.5%) and SynNet (25.2%). In synthesizable hit expansion and goal-directed optimization, ReaSyn consistently yields higher optimization performance and pathway/building block diversity.
6. Comparative Advantages and Methodological Differentiators
Relative to prior approaches:
- Coverage: By reasoning through explicit synthetic pathways, ReaSyn is able to explore a larger swath of synthesizable space than models relying on post-hoc scoring or template matching.
- Diversity: The explicit per-step supervision and RL-guided diversity avoid mode collapse, resulting in a broader proposal set for each input molecule, including novel synthesizable analogs.
- Synthesis Assurance: Since each output is constructed from validated reactant–reaction tuples, synthesizability is guaranteed by construction rather than estimated.
- Generalizability: The methodology is competitive even when operating with a smaller set of building blocks and reactions, demonstrating efficient knowledge utilization and minimal dependency on large reagent libraries.
These advances suggest that the explicit chain-of-reaction paradigm not only improves practical synthetic tractability but also provides a framework for future developments in interpretable and mechanistically-informed molecular generation.
7. Prospective Directions and Broader Implications
The conceptual and empirical benefits of ReaSyn indicate several directions for future research and application:
- Drug Discovery: It stands to facilitate rapid, property-optimized discovery and lead optimization, by ensuring that generated molecules are linked to executable synthetic routes.
- Hybrid Mechanistic–Generative Systems: The analogy of chemical synthesis to stepwise reasoning in LLMs (via CoR/CoT) provides a roadmap for integrating further mechanistic chemical information, including reagents, yields, and reaction conditions, into generative models.
- Scientific Discovery Domains: The chain-of-reaction reasoning model may inspire similar stepwise, interpretable generation approaches in other domains where process-level compositionality and intermediate validation are critical.
- Retrosynthesis and Beyond: Although developed for synthesizable projection, the methodology is extensible to other inverse design settings, potentially closing the gap between generative design and practical realization.
In sum, ReaSyn operationalizes a reasoning-centric approach to synthesizable molecule generation, combining stepwise explicit pathway generation, dense supervision, and RL-based optimization, leading to superior outcomes in both reconstructive and discovery-oriented tasks, and establishing a new direction in synthesizability-anchored generative models (Lee et al., 19 Sep 2025).