Papers
Topics
Authors
Recent
Search
2000 character limit reached

RMechDB: Radical Mechanism Database

Updated 25 May 2026
  • RMechDB is a publicly available, expertly curated database of 5,500 fully balanced, atom-mapped radical elementary steps that encode explicit electron flow and orbital details.
  • It employs the OrbChain formalism to represent one-electron movements with fish-hook arrows and ensures mechanistic consistency through rigorous atom mapping.
  • The database underpins interpretable ML predictors like RMechRP and supports pathway enumeration in applications ranging from atmospheric chemistry to synthetic planning.

RMechDB is a publicly available, expertly curated database of elementary radical reaction steps, purpose-built to encode mechanistic detail—including atom mapping and explicit radical arrow-pushing—that is absent from broad, patent-derived chemical reaction corpora. RMechDB enables both mechanistic modeling and benchmarking of deep-learning predictors specific to radical reaction chemistry, with contextual annotation of molecular orbitals, mechanistic steps, and electron flow for over 5,500 reactions spanning textbook paradigms and modern atmospheric chemistry (Tavakoli et al., 2023).

1. Composition, Scope, and Curation

RMechDB (Radical Mechanism Database) v1.0 comprises ~5,500 fully balanced, atom-mapped radical elementary steps. Reactions are sourced from both canonical textbook radical mechanisms—such as homolytic cleavages, radical additions, and hydrogen abstractions—and from primary literature detailing atmospheric oxidation processes (notably those involving hydroxyl, peroxy, and alkoxy radicals). The database catalogs over 1,000 unique radical species, including organic, inorganic, and mixed-phase radicals.

Every entry is labeled by:

  • Defined reaction class (e.g., H-abstraction, addition, β-scission, recombination)
  • Number of fish-hook arrows (always two per elementary step)
  • Reactive molecular orbital (MO) pair, with explicit topological and electronic center annotations

Reactions are represented using the OrbChain formalism, comprising:

  • RR and PP as reactant and product molecular graphs (nodes: atom labels, edges: bond orders)
  • AA as a set of directed half-arrows (one-electron fish-hook arrows encoding electron flow)
  • Atom mappings from RR to PP, with unique arrow codes specifying MO identities

Curation by expert chemists included manual verification of atom mapping, arrow-pushing mechanisms, adherence to mass/electron balance, and consistency checks (e.g., avoidance of topologies violating Bredt’s rule) (Tavakoli et al., 2023).

2. Data Structure, Representation, and Accessibility

RMechDB is distributed in both machine- and human-readable forms:

  • JSON: for each entry, provides atom-mapped SMILES for reactants/products (with isotopic radical labels), arrow-code strings (e.g., "2 ⟨– 1 ⟨–"), and OrbChain MO descriptors.
  • SDF (RDKit-compatible): enables substructure search and molecular fingerprinting.

The database is partitioned into standardized training and test splits, used by Tavakoli et al. and subsequent benchmarking efforts:

Subset Train Test
Core (textbook) 1,512 150
Atmospheric (specific) 3,397 367
Combined 4,909 517

No separate validation set is provided; five-fold cross-validation is performed during hyperparameter tuning. Preprocessing involves canonicalization of SMILES, removal of atom mapping and arrow codes for text-based models, and generation of negative samples for contrastive learning (Tavakoli et al., 2023).

3. Mechanistic Encoding: OrbChain Formalism

Each mechanistic step is fully specified at the orbital level:

  • Fish-hook arrows represent individual one-electron movements.
  • Atom mapping is explicit for every atom, ensuring mechanistic and mass/electron balance.
  • Reactive MO pairs: For each step, the involved MOs (specified by atom, environment, electron count, and connectivity) are annotated, ensuring a bijective mapping between reactants, products, electron movement, and orbital transformations.

This explicit mechanistic mapping enables unambiguous translation between graph-level reaction representations and quantum-chemical mechanistic models.

4. Benchmarking Radical Reaction Prediction: RMechRP

RMechDB underpins RMechRP, a radical mechanistic reaction predictor designed for high interpretability and mechanistic accuracy. RMechRP incorporates three principal model types:

  • Two-step predictor: Identifies reactive sites via node classification (using atom descriptors or GNN) followed by mechanism ranking using a Siamese network.
  • Contrastive mechanistic learner: Scores pairs of atoms (ai,aj)(a_i, a_j) via two-tower MLPs and interaction scoring, trained with a contrastive loss:

L=1−σ([f(a1∗)g(a2∗)]−[f(a1′)g(a2′)])\mathcal{L} = 1 - \sigma\left([f(a_1^*)g(a_2^*)] - [f(a_1')g(a_2')]\right)

where σ(x)=1/(1+e−x)\sigma(x) = 1/(1+e^{-x}).

  • Rxn-Hypergraph attention model: Learns atomic embeddings directly from molecular hypergraphs, integrating with the same contrastive loss scheme.
  • Text-based (seq2seq) molecular transformer: Pretrained on USPTO data, fine-tuned using RMechDB entries, with SMILES-level tokenization; arrow codes are omitted in text-only models.

For model evaluation, top-N "mechanistic-step accuracy" is the principal metric: the probability that the correct reaction mechanism is ranked in the top N predictions.

The performance metrics for the main models are:

Model Core Top1 Core Top5 Atm Top1 Atm Top5 Time (s)
Two-step (best) 62.4% 93.2% 60.4% 91.6% 1.38
Contrastive Rxn-HG 64.3% 95.1% 62.1% 94.1% 1.45
Contrastive Atom-desc 62.9% 94.2% 61.0% 93.0% 0.08
Seq2Seq fine-tuned 57.7% 83.9% 57.1% 82.2% 1.30

Contrastive Rxn-Hypergraph models achieve the highest recall across top-N metrics. The two-step pipeline performs similarly with slower inference. Text-only models lag by 5–10 percentage points and do not improve with fine-tuning solely on RMechDB data (Tavakoli et al., 2023).

5. Interpretability and Mechanistic Applications

RMechDB’s elementary-step granularity confers several interpretability features to trained predictors:

  • Orbital-level predictions: Each predicted mechanistic step directly specifies the orbital pair and fish-hook arrow, suitable for mapping to SMIRKS templates.
  • Pathway enumeration: Iterating single-step predictions enables construction of fully mass-/atom-balanced radical pathway trees, capturing side products and mechanistic branching not accessible to black-box overall transformation models.
  • Traceable atom mapping: Atom-level accuracy ensures all intermediates are chemically valid and balanced.

In applied settings, RMechDB and RMechRP facilitate:

  • Targeted pathway search in atmospheric chemistry (e.g., VOC oxidation, where RMechRP achieved 60% retrieval of known Master Chemical Mechanism products within 2 s per pathway under specified search breadth/depth)
  • Radical-mediated polymerization mechanistic design
  • Prediction of radical intermediates in enzymatic and bioinorganic systems
  • Exploratory synthesis planning where traditional polar-based template approaches are inadequate

The database and predictors are accessible via online interfaces for both single-step and pathway-level tasks (Tavakoli et al., 2023).

6. Broader Impact and Positioning within Mechanistic Databases

RMechDB fills a critical gap in publicly available, mechanistically annotated radical reaction data. Unlike USPTO-based datasets, it encodes explicit electron flow, supports orbital-level mechanistic learning, and covers both textbook and state-of-the-art atmospheric radical chemistry. This enables rigorous benchmarking for next-generation radical reaction models, supports the development of interpretable ML-based predictors such as RMechRP, and provides infrastructure for mechanistic cross-comparisons with other classes of databases (e.g., polar or pericyclic mechanisms).

A plausible implication is that continued expansion of RMechDB could drive advances in interpretable, orbital-level reaction prediction for heterogeneous molecular domains overlooked by traditional retrosynthetic datasets (Tavakoli et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to RMechDB.