ML Models of Vibrating H$_2$CO: Comparing Reproducing Kernels, FCHL and PhysNet (2006.16752v1)
Abstract: Machine Learning (ML) has become a promising tool for improving the quality of atomistic simulations. Using formaldehyde as a benchmark system for intramolecular interactions, a comparative assessment of ML models based on state-of-the-art variants of deep neural networks (NN), reproducing kernel Hilbert space (RKHS+F), and kernel ridge regression (KRR) is presented. Learning curves for energies and atomic forces indicate rapid convergence towards excellent predictions for B3LYP, MP2, and CCSD(T)-F12 reference results for modestly sized (in the hundreds) training sets. Typically, learning curve off-sets decay as one goes from NN (PhysNet) to RKHS+F to KRR (FCHL). Conversely, the predictive power for extrapolation of energies towards new geometries increases in the same order with RKHS+F and FCHL performing almost equally. For harmonic vibrational frequencies, the picture is less clear, with PhysNet and FCHL yielding respectively flat learning at $\sim$ 1 and $\sim$ 0.2 cm${-1}$ no matter which reference method, while RKHS+F models level off for B3LYP, and exhibit continued improvements for MP2 and CCSD(T)-F12. Finite-temperature molecular dynamics (MD) simulations with the same initial conditions yield indistinguishable infrared spectra with good performance compared with experiment except for the high-frequency modes involving hydrogen stretch motion which is a known limitation of MD for vibrational spectroscopy. For sufficiently large training set sizes all three models can detect insufficient convergence (``noise'') of the reference electronic structure calculations in that the learning curves level off. Transfer learning (TL) from B3LYP to CCSD(T)-F12 with PhysNet indicates that additional improvements in data efficiency can be achieved.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.