QH9: A Quantum Hamiltonian Prediction Benchmark for QM9 Molecules (2306.09549v4)

Published 15 Jun 2023 in physics.chem-ph, cs.AI, and cs.LG

Abstract: Supervised machine learning approaches have been increasingly used in accelerating electronic structure prediction as surrogates of first-principle computational methods, such as density functional theory (DFT). While numerous quantum chemistry datasets focus on chemical properties and atomic forces, the ability to achieve accurate and efficient prediction of the Hamiltonian matrix is highly desired, as it is the most important and fundamental physical quantity that determines the quantum states of physical systems and chemical properties. In this work, we generate a new Quantum Hamiltonian dataset, named as QH9, to provide precise Hamiltonian matrices for 999 or 2998 molecular dynamics trajectories and 130,831 stable molecular geometries, based on the QM9 dataset. By designing benchmark tasks with various molecules, we show that current machine learning models have the capacity to predict Hamiltonian matrices for arbitrary molecules. Both the QH9 dataset and the baseline models are provided to the community through an open-source benchmark, which can be highly valuable for developing machine learning methods and accelerating molecular and materials design for scientific and technological applications. Our benchmark is publicly available at https://github.com/divelab/AIRS/tree/main/OpenDFT/QHBench.

Authors (7)

Haiyang Yu (109 papers)
Meng Liu (112 papers)
Youzhi Luo (17 papers)
Alex Strasser (4 papers)
Xiaofeng Qian (37 papers)
Xiaoning Qian (69 papers)
Shuiwang Ji (122 papers)

Citations (12)

View on Semantic Scholar

Summary

Quantum Hamiltonian Prediction Benchmark: The QH9 Dataset

The paper "QH9: A Quantum Hamiltonian Prediction Benchmark for QM9 Molecules" presents a substantial contribution to the field of computational quantum chemistry by introducing the QH9 dataset. This dataset serves as a resource for training and evaluating machine learning methodologies aimed at predicting quantum Hamiltonian matrices, which are pivotal in determining the quantum states and chemical properties of molecular systems.

Dataset and Motivation

The QH9 dataset is designed to address a crucial gap in existing quantum chemistry datasets, specifically in predicting the Hamiltonian matrices using machine learning models. Traditionally, the computational burden of methods like Density Functional Theory (DFT) in obtaining these matrices is significant, particularly given their cubic to quartic scaling with the number of electrons and optimization steps. QH9 provides a dataset grounded in the QM9 dataset: it includes Hamiltonian matrices for 130,831 stable molecular geometries and 999 molecular dynamics trajectories, sampled to span various molecular sizes and geometrical configurations, significantly expanding the scope for machine-learning-based studies.

Benchmark Tasks

QH9 sets forth four benchmark tasks designed to assess the capability of machine learning models in different scenarios:

QH9-stable-id: Involves training with a random split of stable geometries, serving as a baseline.
QH9-stable-ood: Focuses on out-of-distribution generalization by training models on a subset of molecular sizes and evaluating them on larger, unseen molecular sizes.
QH9-dynamic-geo: Examines generalization across different geometrical configurations for the same molecule.
QH9-dynamic-mol: Addresses the more challenging task of extending prediction capabilities from known to unknown molecules entirely.

These tasks provide a comprehensive framework for evaluating the predictive performance and generalization potential of models in predicting these fundamental matrices.

Methodology and Results

The paper utilizes an equivariant quantum tensor network, specifically QHNet, as the primary evaluation model. QHNet has been tailored to adhere to symmetry and equivariance principles intrinsic to the quantum Hamiltonian matrices. The results show that QHNet achieves impressive accuracy across tasks, with strong performance in predicting both Hamiltonian matrices and derived properties such as orbital energies and electronic wavefunctions.

In experimenting with out-of-distribution generalization, it is noted that models face challenges when molecular sizes extend beyond those seen during training. Furthermore, QHNet demonstrates significant computational gains in terms of reduced optimization steps during DFT calculations when initialized with predicted matrices.

Implications and Future Directions

The introduction of QH9 heralds a pivotal advancement in enabling efficient machine learning approaches to quantum chemistry. By providing a richly annotated dataset and associated benchmarks, this work facilitates the development of models that can potentially scale and generalize beyond the training data, addressing substantial computational challenges.

Future research could explore the development of more sophisticated models that further improve prediction accuracy and generalization, leveraging the rich data landscape that QH9 offers. Additionally, integrating such models into DFT workflows could significantly optimize computational resources, opening avenues for faster material and molecular design.

Overall, QH9 is poised to be an influential dataset for advancing the predictive capabilities of machine learning models in quantum chemistry, fostering innovation in both theoretical developments and practical applications within the field.

PDF Markdown

Related Papers

Find Related Papers