Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models (2410.12771v1)

Published 16 Oct 2024 in cond-mat.mtrl-sci, cs.AI, and physics.comp-ph

Abstract: The ability to discover new materials with desirable properties is critical for numerous applications from helping mitigate climate change to advances in next generation computing hardware. AI has the potential to accelerate materials discovery and design by more effectively exploring the chemical space compared to other computational methods or by trial-and-error. While substantial progress has been made on AI for materials data, benchmarks, and models, a barrier that has emerged is the lack of publicly available training data and open pre-trained models. To address this, we present a Meta FAIR release of the Open Materials 2024 (OMat24) large-scale open dataset and an accompanying set of pre-trained models. OMat24 contains over 110 million density functional theory (DFT) calculations focused on structural and compositional diversity. Our EquiformerV2 models achieve state-of-the-art performance on the Matbench Discovery leaderboard and are capable of predicting ground-state stability and formation energies to an F1 score above 0.9 and an accuracy of 20 meV/atom, respectively. We explore the impact of model size, auxiliary denoising objectives, and fine-tuning on performance across a range of datasets including OMat24, MPtraj, and Alexandria. The open release of the OMat24 dataset and models enables the research community to build upon our efforts and drive further advancements in AI-assisted materials science.

Citations (4)

View on Semantic Scholar

Summary

The paper introduces the OMat24 dataset comprising over 100 million DFT calculations to enable scalable inorganic materials discovery.
The authors utilize the advanced EquiformerV2 Graph Neural Network architecture to improve prediction accuracy across varied non-equilibrium structures.
The integrated approach significantly reduces computational costs compared to traditional DFT methods, accelerating research in materials science.

Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models

The paper introduces the Open Materials 2024 (OMat24) dataset and associated models, developed by the Fundamental AI Research (FAIR) group at Meta, focusing on inorganic materials research. This initiative aims to advance materials discovery by leveraging large-scale machine learning models, significantly scaling up from previous datasets and models.

Context and Objectives

Materials discovery is crucial for addressing global challenges, such as developing renewable energy solutions. The computational search space for potential new materials is vast, necessitating efficient filtering techniques to identify promising candidates. Traditionally, Density Functional Theory (DFT) has been used for calculating formation energies, but it is computationally expensive. Recent advances in machine learning, particularly using Graph Neural Networks (GNNs), have started to provide cost-effective alternatives to DFT.

The OMat24 dataset offers a comprehensive collection of over 100 million DFT calculations, focusing on non-equilibrium atomic configurations and elemental compositions. It builds upon existing datasets like MPtrj, the Materials Project, and Alexandria, aiming to improve the prediction accuracy of ML models for materials science.

Dataset Composition and Methodology

OMat24 consists of various DFT calculations, generated through methods like Boltzmann sampling, ab initio molecular dynamics (AIMD), and rattled structure relaxations. This approach results in a wide array of structures with labeled energy, forces, and stress, ensuring broad compositional diversity and coverage of non-equilibrium structures.

Significant computational resources, over 400 million core hours, were utilized to generate the dataset. Structures range from 1 to 100 atoms, with emphasis on non-equilibrium states to better train models for predicting properties far from equilibrium.

Models and Training Strategies

The authors employ the EquiformerV2 model, a GNN architecture known for its superior performance on similar datasets. The model training strategy explores three avenues: training directly on OMat24, using the MPtrj dataset, and fine-tuning pre-trained models on both MPtrj and a subset of Alexandria.

OMat24 Model Training: Models trained solely on OMat24 exhibit enhanced generalizability across in-domain (ID) and out-of-domain (OOD) datasets, validating the dataset's diversity and richness.
Compliant Models: Models trained exclusively on the MPtrj dataset achieved state-of-the-art performance on the Matbench-Discovery benchmark, underscoring the efficacy of the EquiformerV2 architecture, especially when augmented with Denoising Non-equilibrium Structures (DeNS).
Fine-tuned Models: Pre-training on OMat24 significantly improves performance when models are fine-tuned on MPtrj, achieving the highest scores on the Matbench-Discovery leaderboard, with notable improvements in energy above hull predictions.

Implications and Future Directions

The OMat24 dataset and the models developed set a new standard for materials discovery. The release of both the dataset and accompanying models as open resources facilitates further research and development in machine learning applications in materials science. These advancements highlight the potential to use AI to not only predict material properties more accurately but also to accelerate the discovery process effectively.

The strategies and methodologies presented point towards further exploration of AI's role in materials exploration. Future work might focus on integrating more accurate DFT functionals like SCAN and expanding the dataset to include defects and lower-dimensional structures. Additionally, the potential for these models to enhance molecular dynamics and Monte Carlo simulations presents an exciting area for further research.

The paper provides a substantial contribution to open scientific resources in materials science, laying a foundation for ongoing advancements capable of addressing both theoretical and practical challenges in the field.

PDF Markdown

Related Papers

Tweets

https://twitter.com/OpenCatalyst/status/1847323490547876324

https://twitter.com/jehad__abed/status/1847417218780348866

https://twitter.com/johnkitchin/status/1847997774173716523

https://twitter.com/adamyormark/status/1847636404467077323

https://twitter.com/arxivsanitybot/status/1847465796785631325

https://twitter.com/namankatyal14/status/1869173283834527950

YouTube

Show All Videos