HORM: A Large Scale Molecular Hessian Database for Optimizing Reactive Machine Learning Interatomic Potentials (2505.12447v1)
Abstract: Transition state (TS) characterization is central to computational reaction modeling, yet conventional approaches depend on expensive density functional theory (DFT) calculations, limiting their scalability. Machine learning interatomic potentials (MLIPs) have emerged as a promising approach to accelerate TS searches by approximating quantum-level accuracy at a fraction of the cost. However, most MLIPs are primarily designed for energy and force prediction, thus their capacity to accurately estimate Hessians, which are crucial for TS optimization, remains constrained by limited training data and inadequate learning strategies. This work introduces the Hessian dataset for Optimizing Reactive MLIP (HORM), the largest quantum chemistry Hessian database dedicated to reactive systems, comprising 1.84 million Hessian matrices computed at the $\omega$B97x/6-31G(d) level of theory. To effectively leverage this dataset, we adopt a Hessian-informed training strategy that incorporates stochastic row sampling, which addresses the dramatically increased cost and complexity of incorporating second-order information into MLIPs. Various MLIP architectures and force prediction schemes trained on HORM demonstrate up to 63% reduction in the Hessian mean absolute error and up to 200 times increase in TS search compared to models trained without Hessian information. These results highlight how HORM addresses critical data and methodological gaps, enabling the development of more accurate and robust reactive MLIPs for large-scale reaction network exploration.