Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

G-MATT: Single-step Retrosynthesis Prediction using Molecular Grammar Tree Transformer (2305.03153v2)

Published 4 May 2023 in cs.LG, cs.AI, cs.FL, cs.SC, and q-bio.QM

Abstract: Various template-based and template-free approaches have been proposed for single-step retrosynthesis prediction in recent years. While these approaches demonstrate strong performance from a data-driven metrics standpoint, many model architectures do not incorporate underlying chemistry principles. Here, we propose a novel chemistry-aware retrosynthesis prediction framework that combines powerful data-driven models with prior domain knowledge. We present a tree-to-sequence transformer architecture that utilizes hierarchical SMILES grammar-based trees, incorporating crucial chemistry information that is often overlooked by SMILES text-based representations, such as local structures and functional groups. The proposed framework, grammar-based molecular attention tree transformer (G-MATT), achieves significant performance improvements compared to baseline retrosynthesis models. G-MATT achieves a promising top-1 accuracy of 51% (top-10 accuracy of 79.1%), invalid rate of 1.5%, and bioactive similarity rate of 74.8% on the USPTO- 50K dataset. Additional analyses of G-MATT attention maps demonstrate the ability to retain chemistry knowledge without relying on excessively complex model architectures.

Citations (6)

Summary

We haven't generated a summary for this paper yet.