Leveraging Persistent Homology Features for Accurate Defect Formation Energy Predictions via Graph Neural Networks (2407.05204v2)

Published 6 Jul 2024 in cond-mat.mtrl-sci

Abstract: In machine-learning-assisted high-throughput defect studies, a defect-aware latent representation of the supercell structure is crucial to the accurate prediction of defect properties. The performance of current graph neural network (GNN) models is limited due to the fact that defect properties depend strongly on the local atomic configurations near the defect sites and due to the over-smoothing problem of GNN. Herein, we demonstrate that persistent homology features, which encode the topological information of local chemical environment around each atomic site, can characterize the structural information of defects. Using the dataset containing a wide spectrum of \ch{O}-based perovskites with all available vacancies as an example, we show that incorporating the persistent homology features, along with proper choices of graph pooling operations, significantly increases the prediction accuracy, with the MAE reduced by 55\%. Those features can be easily integrated into the state-of-the-art GNN models, including the graph Transformer network and the equivariant neural network, and universally improve their performance. Besides, our model also overcomes the convergence issue with respect to the supercell size that was present in previous GNN models. Furthermore, using the datasets of defective \ch{BaTiO3} with multiple substitutions and multiple vacancies as examples, our GNN model can also predict the defect-defect interactions accurately. These results suggest that persistent homology features can effectively improve the performance of machine learning models and assist the accelerated discovery of functional defects for technological applications.