Dice Question Streamline Icon: https://streamlinehq.com

Effect of "Rattled" Structures in OMat24 on Universal MLIPs and Architecture Dependence

Determine whether including a large proportion of rattled structures in the OMat24 dataset (approximately 45% of the data) and the magnitude of the applied rattling is generally beneficial for the current generation of universal machine-learning interatomic potentials, or whether the adverse behaviors observed when training on the full OMat24 dataset are unique to more unconstrained MLIP architectures.

Information Square Streamline Icon: https://streamlinehq.com

Background

OMat24 combines approximately 100M DFT datapoints, roughly half derived from ab initio molecular dynamics (AIMD) and half from “rattling” low-energy structures. During development, the authors observed undesirable out-of-distribution behavior for homo-nuclear diatomics when training models on the full OMat24 dataset, as evidenced by kinks in diatomic energy curves.

They experimented with filtering based on extreme energies, forces, and stress, which partially improved behavior, but found that using only the AIMD subset eliminated the issue. This led to uncertainty about whether the high proportion and magnitude of rattling are generally helpful for universal MLIPs or whether the observed problems are specific to more unconstrained architectures.

References

Whilst we broadly in favour of retaining as much of a model's training data as possible, it remains unclear if the large proportion of "rattled" systems in OMat24 (45% of the data), and the amount by which they are rattled, is generally beneficial or not for the current generation of universal MLIPs, or whether the problems we have observed are unique to more unconstrained architectures.

Orb-v3: atomistic simulation at scale (2504.06231 - Rhodes et al., 8 Apr 2025) in Appendix, Section: Effect of filtering OMat24