Extrapolation capabilities of universal MLIPs to out-of-distribution atomic environments

Determine the extent to which universal machine learning interatomic potentials (uMLIPs) pre-trained on large materials datasets reliably extrapolate to out-of-distribution atomic environments across common atomistic modeling tasks, and ascertain the conditions under which their predictions maintain accuracy sufficient for materials discovery and design.

Background

Universal machine learning interatomic potentials (uMLIPs) such as M3GNet, CHGNet, and MACE-MP-0 are pre-trained on large, diverse materials datasets to provide broadly applicable force fields and foundations for fine-tuning. While they show promising performance near equilibrium configurations, many practically important atomistic tasks involve out-of-distribution (OOD) environments, including surfaces, defects, solid-solution energetics, phonon modes, and ion migration barriers.

The paper highlights that these OOD settings are underrepresented in common pre-training datasets (e.g., Materials Project relaxation trajectories), raising concerns about systematic errors when uMLIPs are applied beyond the training distribution. The authors therefore call out the need for a systematic understanding of uMLIPs' extrapolative behavior to evaluate their real-world applicability in materials discovery and design.

References

A systematic understanding of the ability of uMLIPs to extrapolate to common atomic-modeling tasks, especially those with atomic environments that are out of distribution (OOD), remains an open question with implications for their real-world applicability in material discovery and design.

Overcoming systematic softening in universal machine learning interatomic potentials by fine-tuning (2405.07105 - Deng et al., 11 May 2024) in Section 1 (Introduction)