Impact of OMat24 'rattled' structures and architecture dependence of observed pathologies

Determine whether including a large proportion of "rattled" (randomly displaced) structures in the OMat24 dataset—and the specific magnitude of those displacements—is beneficial or detrimental for training current universal machine learning interatomic potentials, and ascertain whether the training pathologies observed for diatomic systems when using the full OMat24 dataset are specific to less-constrained architectures that do not enforce energy conservation or rotational equivariance or also occur for more constrained architectures.

Background

The OMat24 dataset contains approximately 45% "rattled" structures, created by perturbing low-energy configurations. During development, models trained on the full OMat24 dataset—both conservative and direct variants—showed undesirable behavior on out-of-distribution homo-nuclear diatomic systems, including kinks in energy curves. Filtering high-energy/force/stress outliers only partially mitigated these issues, whereas restricting training to the AIMD-only subset resolved them.

Given these observations, the authors explicitly note uncertainty about whether the proportion and degree of rattling in OMat24 is beneficial for training universal machine learning interatomic potentials and whether the observed problems are tied to specific architectural choices (e.g., less-constrained, non-conservative, non-equivariant models) or are more general.

References

Whilst we broadly in favour of retaining as much of a model's training data as possible, it remains unclear if the large proportion of "rattled" systems in OMat24 (45\% of the data), and the amount by which they are rattled, is generally beneficial or not for the current generation of universal MLIPs, or whether the problems we have observed are unique to more unconstrained architectures.

— Orb-v3: atomistic simulation at scale (2504.06231 - Rhodes et al., 8 Apr 2025) in Appendix, Section: Effect of filtering OMat24

Impact of OMat24 'rattled' structures and architecture dependence of observed pathologies

Background

References

Related Problems