Revving up 13C NMR shielding predictions across chemical space: Benchmarks for atoms-in-molecules kernel machine learning with new data for 134 kilo molecules (2009.06814v3)

Published 15 Sep 2020 in physics.chem-ph and physics.data-an

Abstract: The requirement for accelerated and quantitatively accurate screening of nuclear magnetic resonance spectra across the small molecules chemical compound space is two-fold: (1) a robust local' ML strategy capturing the effect of neighbourhood on an atom'snear-sighted' property -- chemical shielding; (2) an accurate reference dataset generated with a state-of-the-art first principles method for training. Herein we report the QM9-NMR dataset comprising isotropic shielding of over 0.8 million C atoms in 134k molecules of the QM9 dataset in gas and five common solvent phases. Using these data for training, we present benchmark results for the prediction transferability of kernel-ridge regression models with popular local descriptors. Our best model trained on 100k samples, accurately predict isotropic shielding of 50k `hold-out' atoms with a mean error of less than $1.9$ ppm. For rapid prediction of new query molecules, the models were trained on geometries from an inexpensive theory. Furthermore, by using a $\Delta$-ML strategy, we quench the error below $1.4$ ppm. Finally, we test the transferability on non-trivial benchmark sets that include benchmark molecules comprising 10 to 17 heavy atoms and drugs.

Citations (11)

View on Semantic Scholar