Enhancing NMR Shielding Predictions of Atoms-in-Molecules Machine Learning Models with Neighborhood-Informed Representations (2510.05623v1)
Abstract: Accurate prediction of nuclear magnetic resonance (NMR) shielding with ML models remains a central challenge for data-driven spectroscopy. We present atomic variants of the Coulomb matrix (aCM) and bag-of-bonds (aBoB) descriptors, and extend them using radial basis functions (RBFs) to yield smooth, per-atom representations (aCM-RBF, aBoB-RBF). Local structural information is incorporated by augmenting each atomic descriptor with contributions from the n nearest neighbors, resulting in the family of descriptors, aCM-RBF(n) and aBoB-RBF(n). For 13C shielding prediction on the QM9NMR dataset (831,925 shielding values across 130,831 molecules), aBoB-RBF(4) achieves an out-of-sample mean error of 1.69 ppm, outperforming models reported in previous studies. While explicit three-body descriptors further reduce errors at a higher cost, aBoB-RBF(4) offers the best balance of accuracy and efficiency. Benchmarking on external datasets comprising larger molecules (GDBm, Drug12/Drug40, and pyrimidinone derivatives) confirms the robustness and transferability of aBoB-RBF(4), establishing it as a practical tool for ML-based NMR shielding prediction.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.