Impact of OMat24 'rattled' structures and architecture dependence of observed pathologies
Determine whether including a large proportion of "rattled" (randomly displaced) structures in the OMat24 dataset—and the specific magnitude of those displacements—is beneficial or detrimental for training current universal machine learning interatomic potentials, and ascertain whether the training pathologies observed for diatomic systems when using the full OMat24 dataset are specific to less-constrained architectures that do not enforce energy conservation or rotational equivariance or also occur for more constrained architectures.
References
Whilst we broadly in favour of retaining as much of a model's training data as possible, it remains unclear if the large proportion of "rattled" systems in OMat24 (45\% of the data), and the amount by which they are rattled, is generally beneficial or not for the current generation of universal MLIPs, or whether the problems we have observed are unique to more unconstrained architectures.