Mapping the Space of Chemical Reactions Using Attention-Based Neural Networks (2012.06051v1)

Published 9 Dec 2020 in physics.chem-ph, cs.CL, and cs.LG

Abstract: Organic reactions are usually assigned to classes containing reactions with similar reagents and mechanisms. Reaction classes facilitate the communication of complex concepts and efficient navigation through chemical reaction space. However, the classification process is a tedious task. It requires the identification of the corresponding reaction class template via annotation of the number of molecules in the reactions, the reaction center, and the distinction between reactants and reagents. This work shows that transformer-based models can infer reaction classes from non-annotated, simple text-based representations of chemical reactions. Our best model reaches a classification accuracy of 98.2%. We also show that the learned representations can be used as reaction fingerprints that capture fine-grained differences between reaction classes better than traditional reaction fingerprints. The insights into chemical reaction space enabled by our learned fingerprints are illustrated by an interactive reaction atlas providing visual clustering and similarity searching.

Authors (7)

Philippe Schwaller (38 papers)
Daniel Probst (7 papers)
Alain C. Vaucher (12 papers)
Vishnu H. Nair (2 papers)
David Kreutter (1 paper)
Teodoro Laino (20 papers)
Jean-Louis Reymond (2 papers)

Citations (191)

View on Semantic Scholar

Summary

Mapping the Space of Chemical Reactions Using Attention-Based Neural Networks

The paper "Mapping the Space of Chemical Reactions Using Attention-Based Neural Networks" presents an investigation into the application of transformer-based models for the classification and fingerprinting of chemical reactions. The authors, Schwaller et al., demonstrate that these models can infer reaction classes from simple text-based SMILES representations without the need for detailed annotations, achieving a classification accuracy of 98.2% at their best.

Reaction Classification Using Transformer Models

The research utilizes two types of transformer models: an encoder-decoder for sequence-to-sequence tasks and a BERT model for single sentence classification. The BERT model, in particular, exhibited superior performance with a classification accuracy of 98.2% on a dataset comprising 792 different reaction classes. Importantly, this approach eliminates the need for conventional atom-mapping or role separation of reactants and reagents, which are often ambiguous. By analyzing attention weights, the authors observe that key reaction components such as the atoms in the reaction center receive higher attention, highlighting significant motifs learned by the model.

Development of Reaction Fingerprints

Beyond classification, the paper introduces novel reaction fingerprints derived from BERT embeddings. These fingerprints are universal and independent of molecular counts within reactions, facilitating flexible applications across diverse chemical datasets. Leveraging these fingerprints, the authors have developed a visually interactive tool, a "reaction atlas," using TMAP visualization to map high-dimensional spaces into tree-like graphs that effectively cluster reactions by class. This tool promises improved navigation and similarity searching within chemical databases, offering practical utilities for chemists in synthesis planning and condition optimization.

Evaluation and Implications

The proposed approach substantially surpasses traditional methods, such as reactant-reagent-based fingerprinting, which achieved only 41% accuracy in similar classification tasks. The research underscores the transformative potential these attention-based models hold for digital chemistry, particularly in organic synthesis research. By advancing classification accuracy and introducing robust fingerprinting, the paper's methodology aids in precise reaction condition predictions and yields data enhancements, with implications for both mechanistic insights and practical applications in synthesis optimization.

Future Directions

The findings open avenues for further exploration into advanced AI-driven chemical reactions prediction and classification systems. The potential for these models to improve reaction yield predictions and activation energy estimation is noteworthy, paving the way for increased adoption in automated synthesis planning tools and databases that require efficient retrieval and analysis of chemical reactions.

This work illustrates the efficacy of attention-based neural networks in deciphering chemical transformations, setting a benchmark for future developments in computational chemistry, particularly in enhancing the capabilities of AI-driven systems in the experimental and practical domains of chemical synthesis.

PDF Markdown

Related Papers

Find Related Papers