Universal ML Interatomic Potentials (uMLIPs)
- Universal machine learning interatomic potentials (uMLIPs) are data-driven force fields that accurately model quantum-mechanical potential energy surfaces across diverse chemistries.
- They leverage advanced architectures like graph neural networks and SE(3)-equivariant networks to capture complex atomic interactions with high efficiency.
- Training on vast, diverse DFT datasets enables uMLIPs to accelerate structure prediction, defect energetics, and materials screening by orders of magnitude.
Universal machine-learning interatomic potentials (uMLIPs) are general-purpose, data-driven force fields designed to reproduce the quantum-mechanical potential energy surface (PES) across an exceptionally broad range of chemistries, structures, and atomic environments, while delivering orders-of-magnitude acceleration over density-functional theory (DFT). These models are trained on large, chemically diverse datasets and aim to provide near–first-principles accuracy for energies, forces, and derived properties in simulation protocols such as structure prediction, defect energetics, phase stability, and materials screening. uMLIPs are a key enabler of scalable atomistic modeling in materials discovery, catalysis, energy storage, and beyond.
1. Foundational Principles and Network Architectures
Universal machine learning interatomic potentials are built on the principle of local-energy decomposition: the total energy of a collection of atoms is written as a sum of local atomic (or atomic-environment) contributions,
where is a function—implemented via a neural network or kernel regression—of descriptors that encode the geometric and chemical environment around atom within some cutoff. Feature representations typically include one-hot element identity, period/group indices, electronegativity, radii, and learned or physically motivated embeddings.
Modern uMLIPs are graph neural networks (GNNs) or equivariant message-passing networks, which update node features (atomic environments) by aggregating information from neighbors: with and being nonlinear (often multilayer perceptron) functions (An et al., 3 Feb 2026). High body-order (three- and four-body) contributions, angular and distance-dependent basis functions, and explicit equivariance (e.g., SE(3)-equivariance) are frequently leveraged.
Key model examples include:
- M3GNet: Message-passing GNN with explicit three-body encoding and stratified sampling of atomic environments (An et al., 3 Feb 2026).
- CHGNet: Dual-graph network with angle convolutions and optional charge/magnetic embedding (Deng et al., 2024, Yu et al., 2024).
- MACE: Atomic Cluster Expansion–based SE(3)-equivariant GNN capturing up to four-body interactions (Deng et al., 2024, Tahmasbi et al., 23 Dec 2025).
- EquiformerV2: Tensor attention–based transformer with general SE(3)-equivariance for broad chemical transfer (Shuang et al., 5 Feb 2025).
- TeaNet/PFP: Equivariant network operating on combined scalar/vector/tensor features; supports multiple DFT functionals (Shinagawa et al., 9 Mar 2026).
A universal MLIP is characterized by a single set of network parameters (modulo minor heads for multifidelity) and joint training across the periodic table.
2. Dataset Composition and Training Protocols
Universality and transferability are data-driven: the breadth and diversity of the pretraining corpus dictate both chemical coverage and out-of-equilibrium accuracy. State-of-the-art uMLIPs are trained on datasets containing up to hundreds of millions of DFT-labeled configurations (Shuang et al., 5 Feb 2025). Representative datasets include:
- Materials Project Trajectories (MPTrj): Equilibrium/near-equilibrium crystal structures, covering elements and chemistries relevant to inorganic solids.
- OMat24: Extensive non-equilibrium conformations (strained, thermally activated, surface/slab, and high-energy states) (Mehdizadeh et al., 29 Aug 2025, Kraß et al., 16 Jul 2025).
- Alexandria, MATPES: Extended inorganic and molecular reference sets for high configurational diversity.
- Surface and Defects: Targeted DFT data on slabs, adsorbates, vacancies, and other non-bulk motifs.
Loss functions combine MSE on energies, forces, and stresses, with carefully balanced weights: Training protocols employ large batch stochastic optimization (AdamW/SGD), regularization, learning-rate scheduling, and, in some cases, data-augmentation (random distortions, high-T MD, stratified sampling).
The composition and sampling of the training set, especially the inclusion of off-equilibrium and high-energy points, are the dominant factors determining OOD transferability, as shown in large-scale benchmarks for surfaces, defects, and property prediction (Kraß et al., 16 Jul 2025, Mehdizadeh et al., 29 Aug 2025, Xu et al., 4 Dec 2025).
3. Performance Benchmarks, Accuracy Profiles, and Limitations
uMLIPs achieve remarkable accuracy on a spectrum of benchmarks:
- Structure Prediction: Substantial speed-ups (103–104× per force/energy call) over DFT in crystal-structure prediction (CSP), enabling complex multinary oxide exploration. For example, M3GNet-guided CSP identified new phases in quaternary Sr–Li–Al–O faster by a factor of 40, with energy/formation predictions within ∼10 meV/atom of DFT (An et al., 3 Feb 2026).
- Defect and Surface Energetics: State-of-the-art EquiformerV2 and MACE models achieve energy RMSEs < 5 meV/atom and force RMSEs < 100 meV/Å on defect-rich metals, grain boundaries, and random alloys—competitive or superior to system-specific MLIPs (Shuang et al., 5 Feb 2025, Berger et al., 9 Apr 2025, Lebedaa et al., 5 Dec 2025).
- Mechanical/Elastic Moduli: Bulk/shear/Young's modulus errors are typically 15–25 GPa (MAE) vs DFT for high-accuracy models (SevenNet, MACE) across ∼11 000 MP structures (Gao et al., 27 Oct 2025).
- Phonons and Thermodynamics: High-coverage OMat24- or OAM-trained models recover phonon spectra to within 4–8 meV (MAE) and deliver accurate bulk modulus, volume, and heat-capacity predictions for MOFs and other porous frameworks (Kraß et al., 16 Jul 2025).
- Surface Stability and Cleavage Energies: Non-equilibrium pretraining (OMat24) is essential for sub-6% MAPE on cleavage energy benchmarks comprising ∼37 k slabs across 89 elements; equilibrium-only or surface-adsorbate training degrades performance by up to 17× (Mehdizadeh et al., 29 Aug 2025).
- High-Pressure and High-Energy Regimes: Out-of-distribution configurations (high pressure, high temperature, deep compressions) expose systematic blind spots, with energy and volume errors growing by factors of 5–10, unless specifically addressed by fine-tuning or targeted data inclusion (Loew et al., 25 Aug 2025).
A common limitation is "PES softening"—underestimation of PES curvature (phonon frequencies, energy barriers)—due to concentrated near-equilibrium sampling. This systematic error is remediable via minimal fine-tuning (as little as a single high-energy MD structure) or active learning on challenging regimes (Deng et al., 2024).
In all cases, training data diversity and targeted augmentation for the relevant application domains overshadow architectural complexity as the most important determinants of transferability and accuracy (Mehdizadeh et al., 29 Aug 2025, Kraß et al., 16 Jul 2025).
4. Transfer Learning, Fine-Tuning, and Multi-Fidelity Strategies
uMLIPs support rapid adaptation to new tasks through multiple transfer and fine-tuning protocols:
- Elemental Energy Referencing (AtomRef): For transfer across DFT functional hierarchies (e.g., PBE → r2SCAN), a per-element reference is subtracted (fitted to both low- and high-fidelity datasets), and only the residuals are predicted by the GNN (Huang et al., 7 Apr 2025). Freezing the target AtomRef before GNN weight updates is essential for stable and accurate transfer (MAE reductions of ∼14 meV/atom in decomposition/formation energy).
- Δ-Learning and Active Learning: Correction models learn the energy (and force) deviation, , using Gaussian Process Regression or kernel regression on high-dimensional SOAP descriptors. This protocol is highly data-efficient and suitable for integration with global optimization and structure search workflows (Pitfield et al., 24 Jul 2025).
- Knowledge Distillation: Lightweight models (e.g. SevenNet-Nano, 105k parameters) inherit teacher representations by regressing to the outputs of larger foundation models (e.g. SevenNet-Omni, 26M parameters), achieving >10× acceleration with minimal loss of accuracy (energy validation MAEs ∼23 meV/atom, force ∼0.08 eV/Å) (Oh et al., 13 Apr 2026).
- Uncertainty Quantification and Model Distillation: Ensemble-based variance metrics (, an inverse-RMSE-weighted ensemble) provide robust uncertainty estimation. Selective DFT labeling driven by high 0 values enables distilled potentials that surpass fully DFT-trained models by filtering label noise (Liu et al., 28 Jul 2025).
Scaling-law analyses show that transfer learning (TL) yields substantial data-efficiency gains—often over an order of magnitude compared to training from scratch—provided that reference shifts are correctly accounted for (Huang et al., 7 Apr 2025).
5. Workflow Automation, Software Infrastructure, and Practical Use Cases
uMLIPs are integrated into automated, high-throughput materials simulation pipelines. The UniMatSim Python framework exposes a unified API across multiple uMLIPs (CHGNet, M3GNet, MACE, etc.), with standardized workflows for optimization, property calculation, stability screening, and 2D/low-D materials handling (Xiang et al., 11 Mar 2026). Key automation features include:
- Task orchestration for relaxation, elasticity, phonon, and MD protocols—with asset checkpointing and dependency tracking.
- Consensus screening, using intersection filters over multiple model variants to reduce stochastic false positives in large-scale stability searches (e.g., multi-model screening of >1000 2D Lieb-lattice prototypes).
- Standardized modules for property evaluation (elastic constants, EOS, Born–Huang criteria, BZ path selection for phonon bands).
- Data/metadata management via Python API, CLI, and RESTful interface; plans for further DB integration and active learning.
Applications documented for uMLIPs include: accelerated de novo CSP in complex oxides (An et al., 3 Feb 2026), high-resolution mapping of defect/interstitial energetics in complex alloys (Lebedaa et al., 5 Dec 2025), rapid large-scale vacancy and etching screening in the Materials Project catalog (Berger et al., 9 Apr 2025), high-throughput MOF optimization and property prediction (Kraß et al., 16 Jul 2025), and supporting global structure optimization in challenging cluster and surface systems (Pitfield et al., 24 Jul 2025).
6. Best Practices, Systematic Limitations, and Paths Forward
Despite their transformative impact, uMLIPs exhibit systematic limitations shaped by both the quantum reference they are trained upon and the diversity of their training sets:
- DFT functional inheritance: Models inherit intrinsic errors from the exchange-correlation functional used for generating training data. For example, M3GNet trained on PBE reverses relative phase stabilities compared to benchmarks with meta-GGA (SCAN, r2SCAN, RPA) (An et al., 3 Feb 2026). Targeting more accurate references (e.g., r2SCAN as in PFP v8) measurably improves agreement with experiment (e.g., melting points MAE halved from 279 K to 133 K) (Shinagawa et al., 9 Mar 2026).
- PES softening: A consequence of near-equilibrium oversampling is underprediction of PES curvature, observable as lower phonon frequencies, migration barriers, and defect/surface formation energies (Deng et al., 2024, Kraß et al., 16 Jul 2025). Incorporation of high-energy MD, strained, and non-equilibrium geometries in training is essential for robust transfer.
- High-pressure/High-energy OOD regimes: Systematic accuracy loss is observed under compression (up to 150 GPa) and at high temperatures. Fine-tuning on pressure-specific data restores performance to near-zero-pressure levels (Loew et al., 25 Aug 2025).
- Sampling Bottleneck in Structure Prediction: With uMLIPs removing energy/force bottlenecks, the dominant computational cost in CSP workflows has shifted to search efficiency. Evolutionary/genetic algorithms may fail to explore large configuration spaces, limiting the discovery of high-complexity phases (An et al., 3 Feb 2026).
- Latent Feature Diversity and Pretraining Bias: Cross-model comparisons show that uMLIPs encode chemical/structural information in significantly distinct ways; fine-tuned models often retain a strong pre-training bias in their latent features, elucidating the origin of rapid adaptation but also the constraints of prior selection (Chorna et al., 5 Dec 2025).
Recommended practices include cross-validation of predicted ground states with higher-level theory (meta-GGA, RPA), benchmarking out-of-domain predictions with reference calculations, and targeted fine-tuning where application-domain error is critical.
A plausible implication is that, as model architectures saturate representational capacity, further gains in universality and application-specific fidelity will hinge on strategic, physically motivated training set design, multi-fidelity learning, uncertainty quantification, and end-to-end workflow automation.
7. Outlook: Dataset-Centric Development and the Next Generation of uMLIPs
Community-wide benchmarks consistently reveal that data composition, not architectural complexity, determines the dominant error regime and transferability of uMLIPs (Mehdizadeh et al., 29 Aug 2025, Kraß et al., 16 Jul 2025). Models trained on large, non-equilibrium samples (e.g., OMat24, active learning–driven campaigns) outperform those trained solely on equilibrium or surface-adsorbate–specific sets—by up to 17× in surface tasks (Mehdizadeh et al., 29 Aug 2025). The integration of multi-domain, multi-fidelity datasets, and routine active learning targeting application-driven “blind spots,” are the key levers for improvement.
A further frontier is harmonizing universal scope with explicit experimental fidelity—training to hybrid functionals (meta-GGA, hybrid, RPA) and beyond, as demonstrated by r2SCAN-trained PFP v8 (Shinagawa et al., 9 Mar 2026), and extending modular transfer/Δ-learning for rapid, domain-targeted adaptation. The increasing standardization of APIs, workflow engines, and benchmarking suites—for example, UniMatSim and MOFSimBench—ushers in a new era of reproducible, automated, and scalable uMLIP-driven materials modeling.
In sum, universal machine-learning interatomic potentials have rapidly evolved from theoretical constructs to practical foundation models underpinning atomistic simulation and materials discovery, with performance and transferability tightly governed by systematic, data-centric methodology and best practices established by recent large-scale benchmarks (An et al., 3 Feb 2026, Deng et al., 2024, Kraß et al., 16 Jul 2025, Mehdizadeh et al., 29 Aug 2025, Shinagawa et al., 9 Mar 2026).