Benchmarking Data Efficiency in $Δ$-ML and Multifidelity Models for Quantum Chemistry (2410.11391v3)
Abstract: The development of ML methods has made quantum chemistry (QC) calculations more accessible by reducing the compute cost incurred in conventional QC methods. This has since been translated into the overhead cost of generating training data. Increased work in reducing the cost of generating training data resulted in the development of $\Delta$-ML and multifidelity machine learning methods which use data at more than one QC level of accuracy, or fidelity. This work compares the data costs associated with $\Delta$-ML, multifidelity machine learning (MFML), and optimized MFML (o-MFML) in contrast with a newly introduced Multifidelity$\Delta$-Machine Learning (MF$\Delta$ML) method for the prediction of ground state energies, vertical excitation energies, and the magnitude of electronic contribution of molecular dipole moments from the multifidelity benchmark dataset QeMFi. This assessment is made on the basis of training data generation cost associated with each model and is compared with the single fidelity kernel ridge regression (KRR) case. The results indicate that the use of multifidelity methods surpasses the standard $\Delta$-ML approaches in cases of a large number of predictions. For applications which require only a few evaluations to be made using ML models, while the $\Delta$-ML method might be favored, the MF$\Delta$ML method is shown to be more efficient.