Universal Machine-Learned Potentials

Updated 15 August 2025

Universal machine-learned potentials are data-driven models that predict atomic energies, forces, and derived properties with near-quantum-mechanical accuracy.
They leverage advanced local descriptors and symmetry-preserving architectures, such as atom-centered functions and graph-based representations, to capture many-body interactions.
Recent advances include hybrid physics-ML integration, active sampling, and fine-tuning strategies that enhance transferability and address training data imbalances.

Universal machine-learned potentials (UMLPs) are data-driven models that aim to describe interatomic interactions with near-quantum-mechanical accuracy for a chemically and structurally diverse set of systems, leveraging large-scale machine learning frameworks, symmetry-aware architectures, and heterogeneous datasets. They are designed with the ambition of serving as “foundation models” for atomistic simulations, capable of accurately predicting energies, forces, and derived properties (e.g., phonons, elastic moduli, defect energetics) for molecules, condensed phases, alloys, biomolecules, and interfaces across the periodic table. Recent research has established methodological, algorithmic, and application-level advances but also highlighted inherent challenges related to training data coverage, architecture design, and transferability for specialized tasks.

1. Architectural Principles, Representations, and Key Formulations

UMLPs are characterized by the use of advanced local environment descriptors and machine learning architectures that capture complex many-body interactions and physical symmetries:

Local Environment Descriptors: Approaches span atom-centered symmetry functions, many-body polynomials (e.g., permutationally invariant polynomials, PIPs), atomic cluster expansions (ACE), and graph-based descriptors. For example, in the NEP and UNEP-v1 architectures, the site energy for atom $i$ of species $I$ is:

$U^i = \mathcal{N}(w^I; q^i(\{c^{IJ}\}))$

where $q^i$ is a species-dependent descriptor vector computed from polynomial expansions of local neighbor distances and angles, and $w^I$ are neural network weights for species $I$ (Song et al., 2023, Liang et al., 30 Apr 2025).

Symmetry Preservation: Modern UMLPs universally enforce invariance (or equivariance) to translation, rotation, and permutation. Equivariant message-passing architectures (e.g., MACE, CHGNet, ICTP) use symmetrized tensor products, spherical harmonics, and Clebsch–Gordan coefficients to guarantee proper transformation under symmetry operations (Shiota et al., 2024, Zaverkin et al., 14 Aug 2025).
Universal Additivity: The total energy is usually decomposed as a sum of per-atom, per-monomer, or per-site energies:

$E_\text{total} = \sum_i U^i$

or, in monomer-centered frameworks, as a sum over perturbed monomer energies:

$E_\text{total} = \sum_{i=1}^N E_i$

where $E_i$ reflects the chemically meaningful energy contribution of a molecular fragment (Yu et al., 2024).

Hybridization with Physical Models: To enhance transferability and stability, especially for out-of-sample configurations (e.g., high-energy close contacts), empirical physics-based terms are incorporated. For instance, the ZBL potential is fused with MLFF models for improved short-range repulsion:

$U_\text{ZBL}(r_{ij}) = \frac{1}{4\pi\epsilon_0} \frac{Z_i Z_j e^2}{r_{ij}} \phi(r_{ij}/a) f_c(r_{ij})$

with a screening and cutoff function ensuring a rapid transition to machine-learned description (Yan et al., 22 Apr 2025).

2. Training Methodologies, Data Strategies, and Universality

The efficacy and scope of UMLPs is strongly determined by their training data and optimization strategies:

Data Curation: Effective universality requires training on large, chemically diverse, and structurally heterogeneous datasets. NEP89, for example, aggregates >110 million inorganic structures (OMAT24), organic/biomolecular systems (SPICE, ANI-1xnr), reactive mixtures, and water phases, with careful homogenization of reference energies and uniform treatment of dispersion corrections (Liang et al., 30 Apr 2025).
Active Sampling & Metadynamics: Novel data acquisition strategies are employed to diversify training configurations, avoiding Boltzmann over-sampling of low-energy basins. In G-metaD sampling, the atomic environment descriptor (e.g., atom-centered symmetry function vector $G$ ) is used as the collective variable for metadynamics, driving the system into unexplored regions of chemical space:

$u_b(G(t)) = h \sum_{t'} \exp \left[ -\frac{1}{2} (G(t) - G(t'))^\mathrm{T} \Sigma^{-1} (G(t) - G(t')) \right]$

with $\Sigma$ encoding the metric in descriptor space (Yoo et al., 2020).

Evolutionary and Fine-Tuning Algorithms: Optimization of large UMLP parameter sets may utilize evolutionary approaches, such as separable natural evolution strategy (SNES) in UNEP-v1, employing species-wise losses and active correction on poorly predicted configurations. Fine-tuning "predictor-corrector" schemes further adapt foundation models to target domains by retraining with system-specific data, often accelerating convergence and surpassing models trained from scratch (Song et al., 2023, Liu et al., 9 Jun 2025, Liu et al., 27 Jun 2025).

3. Explicit Long-Range Interactions and Polarizability

Classical cutoff-based ML potentials often fail to capture essential long-range effects; recent UMLP frameworks resolve this via explicit modeling:

Polarizable Long-Range Schemes: Frameworks incorporate charge-equilibration (PQEq) methods and explicit two-body dispersion (e.g., D3, D4), writing the total energy as:

$E_\text{pot} = \sum_i \left[ E_0^i(\mathbf{r}_i, z_i) + \chi_i^0 q_i + \frac{1}{2}\eta_i^0 q_i^2 + \frac{1}{2} K_s^i r_{i,c,s}^2 \right] + \sum_{i>j} C_{ik,jl}(\mathbf{r}_{ik,jl}) q_{ik} q_{jl} + E_{D3}$

where $q_i$ are dynamically adjusted partial charges, and $C_{ik,jl}$ encodes screened electrostatic interactions (Gao et al., 2024).

Biomolecular Simulations: Explicit long-range electrostatics and dispersion corrections have been benchmarked in large-scale protein and water simulations, revealing nuanced dependencies on model size, training set composition, and simulation observables. Improvements in RMSE do not uniformly lead to enhanced macroscopic properties, while inclusion of explicit electrostatics can impact conformational distributions in flexible systems (Zaverkin et al., 14 Aug 2025).
Field-Responsive Potentials: Models such as FIREANN integrate pseudo field vector-dependent features directly into atomic descriptors, providing explicit dependence on external fields and yielding correct predictions for dipoles, polarizabilities, and field-induced phenomena in both periodic and molecular systems (Zhang et al., 2023).

4. Assessment and Benchmarking: Transferability, Limitations, and Correction

Systematic benchmarking establishes the accuracy and domain of universality:

Materials Science Benchmarks: Large-scale assessments across >10,000 materials show that leading UMLIPs (e.g., NEP89, UNEP-v1, MACE, CHGNet) reproduce energies, forces, lattice and elastic properties, phonon spectra, and defect formation energies with MAEs as low as 0.044 eV/atom (formation energies) and phonon frequency MAEs of several meV (Song et al., 2023, Loew et al., 2024, Liang et al., 30 Apr 2025, Yu et al., 2024).
Surfaces and Defects: Foundational models trained predominantly on bulk DFT data exhibit pronounced error increases for "out-of-domain" structures such as surfaces, low-coordination motifs, and extended defect systems. Error magnitudes in surface energies are correlated with the descriptor-space distance from the training set. Fine-tuning on modest surface datasets, active learning, and dataset diversification have been demonstrated to lower these errors significantly (Focassio et al., 2024, Berger et al., 9 Apr 2025).
Alloy Thermodynamics: For subtle properties—such as mixing enthalpies in binaries and multicomponent alloys—the error in small energy differences may exceed chemical accuracy, occasionally predicting even the wrong sign of mixing energies. Supplementing training with sparse, system-specific DFT calculations can restore correct trends, and UMLPs excel at DFT-accelerated structure relaxation (Casillas-Trujillo et al., 2024).
Phonon Properties: Only certain architectures (e.g., MatterSim, MACE, SevenNet) sufficiently preserve the energy-force relationship needed for accurate phonons. Architectures that directly predict forces, rather than deriving from energy gradients, are shown to be ill-suited for vibrational property prediction, leading to unphysical imaginary frequencies and large errors (Loew et al., 2024).
Chemical Property Prediction and Transfer Learning: Intermediate neural descriptors from UMLPs (e.g., MACE, M3GNet) have been successfully repurposed as fixed-length, transferable feature vectors for downstream chemical property prediction tasks (e.g., NMR chemical shifts), using kernel ridge regression and quantum kernel approaches, yielding accuracy similar or superior to traditional SOAP/FCHL descriptors with much-reduced dimensionality (Shiota et al., 2024).
Active Learning and Delta-Correction: In global optimization and structure search, the combination of a universal surrogate with sparse Gaussian Process Regression $\Delta$ -models (using SOAP descriptors) enables iterative, on-the-fly improvement. This approach, coupled with structure search algorithms such as replica exchange (REX) and GOFEE, leads to robust identification of DFT global minima even in challenging cluster and interface systems (Pitfield et al., 24 Jul 2025).

5. Adaptability, Fine-Tuning, and Lifelong Learning

Adaptation from universal to application-specific performance is accomplished via several complementary mechanisms:

Predictor-Corrector Fine-Tuning: Pre-trained UMLPs provide robust initializations, and fine-tuning rapidly improves accuracy on task-specific datasets, often outperforming models trained from scratch and reducing outlier errors in lattice parameters, defect energies, elastic constants, and stacking fault energies (Liu et al., 9 Jun 2025, Liu et al., 27 Jun 2025).
Continual/Lifelong Learning: Lifelong MLPs dynamically integrate new quantum chemical data discovered during exploration (e.g., in chemical reaction network searches) using adaptive data selection, rehearsal strategies, and stability-plasticity balancing to avoid catastrophic forgetting and to reach chemical accuracy unattainable by static universal models. Adaptive loss weighting and dynamic sampler heuristics are crucial for ensuring both fast learning and retention of prior knowledge (Eckhoff et al., 16 Apr 2025).
Domain-Specific Foundation Models and Model Distillation: The direction toward domain-specialized foundation UMLPs and model distillation aims to further enhance computational efficiency, reduce memory footprint, and improve transferability to materials classes such as 2D materials, solid electrolytes, and molecular crystals (Liu et al., 9 Jun 2025, Liu et al., 27 Jun 2025).

6. Current Challenges and Future Directions

Despite rapid progress, key challenges persist in the deployment and further development of UMLPs:

Training Data Imbalances: Gaps in coverage for surface, defect, high-energy, and out-of-equilibrium structures hinder transferability and robustness. Imbalanced representation of compositional and vibrational degrees of freedom in large datasets can lead to poorly extrapolated properties in biomolecular and interfacial systems (Zaverkin et al., 14 Aug 2025, Focassio et al., 2024).
Physical Property Coverage versus Efficiency: The expansion of physical properties (e.g., long-range electrostatics, phonons) imposes trade-offs in computational efficiency and model complexity. There is a recognized need for architecture innovations that reconcile the speed of direct force prediction with the rigorous accuracy of gradient-derived forces for second-derivative properties (Loew et al., 2024, Gao et al., 2024).
Universal versus Lifelong Learning: While universal models provide an efficient starting point, chemical accuracy in reactive or unexplored regions often necessitates continual adaptation, raising questions about the optimal balance between foundation modeling, active learning, and on-the-fly Δ-correction (Eckhoff et al., 16 Apr 2025, Pitfield et al., 24 Jul 2025).
Algorithmic and Workflow Integration: Seamless integration of UMLPs and their fine-tuned variants into quantum/classical simulation workflows (ASE, LAMMPS, RBMD) and their interoperability for high-throughput screening, structure prediction, and defect analysis are ongoing areas of development (Berger et al., 9 Apr 2025, Liu et al., 27 Jun 2025).
Evaluative Protocols: Standardized benchmarks and evaluation practices for simulation observables (e.g., phase transitions, vibrational spectra, conformational sampling) are needed to better assess model fitness beyond RMSEs on static test sets (Zaverkin et al., 14 Aug 2025).

References to Key Papers

(Yoo et al., 2020) Descriptor-based metadynamics for data sampling.
(Liu et al., 2021) Neural network universal approximators in molecular modeling.
(Zhang et al., 2023) Field-responsive universal ML potentials (FIREANN).
(Song et al., 2023) UNEP-v1: General-purpose NEP for elemental metals/alloys.
(Shiota et al., 2024) GNN transfer learning for scalable chemical property prediction.
(Focassio et al., 2024) Universal foundation models: strengths and out-of-domain challenges for surfaces.
(Yu et al., 2024) uMLIP benchmarking over materials classes.
(Casillas-Trujillo et al., 2024) Mixing enthalpy accuracy and retraining in alloys.
(Gao et al., 2024) Polarizable long-range interaction enhancement in UMLPs.
(Yu et al., 2024) Monomer-centered MB-PIPNet: balance of accuracy, interpretability, and speed.
(Loew et al., 2024) Phonon benchmarking across universal MLIP architectures.
(Xia et al., 11 Feb 2025) Historical evolution and review of MLP approaches.
(Berger et al., 9 Apr 2025) Defect screening and 2D etching with UMLIPs.
(Eckhoff et al., 16 Apr 2025) Lifelong vs. universal models in chemical reaction networks.
(Yan et al., 22 Apr 2025) Hybrid empirical–ML potentials for robustness and training efficiency.
(Liang et al., 30 Apr 2025) NEP89: Empirical-potential speed, high accuracy across 89 elements.
(Liu et al., 9 Jun 2025, Liu et al., 27 Jun 2025) Fine-tuning foundation models: strategies and tutorial.
(Pitfield et al., 24 Jul 2025) Active $\Delta$ -learning for global structure optimization.
(Zaverkin et al., 14 Aug 2025) Biomolecular simulation: model size, data composition, and long-range effects.

Universal machine-learned potentials represent a convergent development at the intersection of machine learning, computational physics, chemistry, and materials science. Their ability to provide a transferable, high-fidelity description of atomic interactions, together with the capacity for rapid adaptation via fine-tuning or lifelong learning, positions them as central tools in next-generation atomistic simulation and discovery. Continuing progress will depend on rigorous expansion of training data coverage, further innovations in architecture, integration of explicit physical interactions, and development of robust, standardized workflows and benchmarks for diverse scientific applications.