Machine-Learned Interatomic Potentials (MLIPs)

Updated 12 August 2025

Machine-learned interatomic potentials (MLIPs) are data-driven models that predict potential energy surfaces with near ab initio accuracy while significantly reducing computational cost.
They leverage advanced descriptors and flexible regression methods, including neural networks and kernel-based models, to capture complex atomic interactions in diverse materials.
Recent progress in MLIPs emphasizes workflow optimization, uncertainty quantification, and physics-informed loss functions to enhance model accuracy, transferability, and practical applicability.

Machine-learned interatomic potentials (MLIPs) are data-driven surrogate models that predict potential energy surfaces (PES) for collections of atoms with near ab initio accuracy but at a fraction of the computational cost. By fitting flexible functional forms (often neural networks or kernel-based regressors) to quantum-mechanical energy, force, and stress data, MLIPs have enabled ab initio–quality molecular dynamics (MD), structural optimization, and property prediction across a vast chemical and configurational space. Current MLIP methodologies combine advanced descriptors of local atomic environments, scalable regression methods, and automated database generation to deliver predictive power for complex materials, molecules, and interfaces.

1. Model Architectures and Descriptor Paradigms

MLIP development is marked by the interplay between representation of atomic environments and the flexibility of statistical learning. Widely-used architectures include:

Atom-centered symmetry function models: Early neural MLIPs (e.g., Behler–Parrinello-type high-dimensional neural network potentials (HDNNP), AENET) encode atomic environments with radial/angular symmetry functions, feeding them into per-element feedforward neural networks. They are efficient and maintain rotational, translational, and permutational invariance, but are often limited in compositional and long-range transferability (Choyal et al., 2023).
Kernel-based potentials: Models such as the Gaussian Approximation Potential (GAP), Spectral Neighbor Analysis Potential (SNAP), and Moment Tensor Potentials (MTP) utilize descriptors (e.g., the Smooth Overlap of Atomic Positions (SOAP), bispectrum, or moment tensors) with kernel regression (e.g., Gaussian process, kernel ridge regression). The total energy can be written as $E(X) = \sum_i \sum_s \alpha_s K(\mathcal R_i, \mathcal R_s)$ , where $K$ is a kernel acting on atomic environments (Allen et al., 2022, Choyal et al., 2023, Baghishov et al., 6 Jun 2025).
Atomic Cluster Expansion (ACE): Linear and nonlinear ACE systematically expand atomic energies in body-ordered polynomials of neighbor environments, providing a physically-motivated yet highly expressive expansion (Leimeroth et al., 5 May 2025).
Message Passing Neural Networks (MPNN): E(3)-equivariant GNNs (e.g., NequIP, MACE, Allegro, MatterSim) can encode many-body and long-range interactions while obeying physical symmetries (Sauer et al., 8 Apr 2025, Leimeroth et al., 5 May 2025). MACE, for example, leverages higher-body message functions and tensor products, supporting fast training and accuracy for metals, oxides, and 2D materials (Volkmer et al., 8 Aug 2025).
Physically-informed models: Equivariant MLIPs integrating charge redistribution via global charge equilibration (Maruf et al., 23 Mar 2025), or dispersion corrections, broaden applicability to polaronic, ionic, and van der Waals (vdW) materials (Sauer et al., 8 Apr 2025).

2. Workflow Optimization: Database Generation, Training, and Uncertainty

The efficiency and accuracy of MLIPs are regulated by judicious design of training data and learning protocols:

Optimal data generation: Non-diagonal supercells (NDSC) target the uniform sampling of vibrational Brillouin zone points, maximizing force constant matrix (FCM) diversity with minimal DFT calculations. For a $4\times4\times4$ phonon grid, NDSCs with $N$ primitive cells sample the full FCM, avoiding the cubic scaling of large diagonal supercells (Allen et al., 2022).
Active and on-the-fly learning: Online active learning, using Bayesian errors or ensemble uncertainty metrics, identifies underrepresented or high-uncertainty regions of configuration space during MLIP-driven MD, triggering new reference calculations only when necessary and achieving up to $98\%$ reduction in ab initio calls (Kang et al., 18 Sep 2024, Volkmer et al., 8 Aug 2025). Acquisition probability is formulated as $P= P(\text{UCE}_k) \cdot P(U)$ , where $P(\text{UCE}_k)$ is uncertainty-based and $P(U)$ penalizes energetically unphysical samples.
Sub-sampling and data efficiency: Leverage scores or CUR decomposition systematically select training points to maximize feature-space coverage, reducing redundancy and focusing on the most informative configurations, essential for high-precision or expensive ab initio target data (Baghishov et al., 6 Jun 2025).
Transfer and multi-fidelity learning: Simultaneous training on low- and high-fidelity data (e.g., GGA and meta-GGA/coupled-cluster) with explicit fidelity encoding in GNN layers leverages broad PES trends and refines only key high-fidelity regions, outperforming transfer learning and $\Delta$ -learning in accuracy and stability (Kim et al., 12 Sep 2024).

3. Physics-Informed Losses, Robustness, and Generalization

MLIP robustness, physical consistency, and generalization are underpinned by advanced loss function design and error analysis:

Physics-Informed Weak Supervision: Augmenting standard supervised energy/force losses with constraints from first-order Taylor expansions and path-independence of conservative forces—e.g., PITC and PISC loss terms—enforces energy-force consistency even with sparse reference labels, reducing errors by up to a factor of two (Takamoto et al., 23 Jul 2024):

$\mathcal{L}_\mathrm{PITC}(S; \theta) = \ell( E(S_{r}; \theta),\, E(S; \theta) - \sum_{i=1}^{N_{at}} \langle r_i, F_i(S; \theta)\rangle )$

Global error propagation: Theoretical frameworks quantify how errors in energy, force, and force constant matching (on training domains of size $L$ ) propagate to predictions (e.g., defect energies, geometries) in large-scale simulations. The resulting bounds justify current best practices of heavy energy weighting and encourage inclusion of higher-order observations in loss functions (Ortner et al., 2022).
Adaptive model design: Fisher information matrix (FIM)-guided composable model design iteratively assembles composite MLIPs from basic submodels, balancing classically-inspired expressivity with stability and minimizing condition number, as reflected by FIM eigenspectrum analysis (Wang et al., 27 Apr 2025).

4. Benchmarking and Application to Complex Chemistries

Systematic benchmarking underpins the choice and deployment of MLIPs for targeted material classes:

Structural and electronic property fidelity: For 2D van der Waals heterostructures, dispersion-corrected MLIPs (augmented with D3 correction) match DFT results in interlayer distance (MAD $\sim 0.11$ Å), intralayer distortion (MAD $\sim 13$ mÅ), and band structure RMS errors ( $\sim 35$ meV), with errors comparable to XC-functional uncertainty (Sauer et al., 8 Apr 2025). Metrics include

$\Delta d_\text{inter}^{(M)} = d^{(M)} - d^{(\mathrm{PW})}$

$\Delta R_\text{intra}^{(M)} = \sqrt{\frac{1}{N_1 + N_2} \sum_{i, a} \left| \tilde{r}_{a, i}^{(M)} - \tilde{r}_{a, i}^{(\mathrm{PW})} \right|^2 }$

Chemical complexity: In Li-based disordered rocksalt cathodes, atom-centered MLIPs (AENET, GAP, SNAP, qSNAP, MTP) achieve energy RMSE as low as 7.5 meV/atom (AENET) and force RMSE $\sim 0.21-0.25$ eV/Å (MTP) for $>10^4$ configurations, with valid compositional and configurational transferability (Choyal et al., 2023).
Molecular and molecular crystal modeling: Data-efficient fine-tuning of foundational potentials with as few as $200$ DFT or AIMD reference structures per molecular crystal attains sub-chemical accuracy in sublimation enthalpy (errors $<4$ kJ/mol), incorporating DMC reference corrections, MD/PIMD for anharmonicity and NQE effects, and generalizing to pharmaceuticals like paracetamol and squaric acid (Pia et al., 21 Feb 2025).
Magnetic and spin-lattice coupled systems: In Fe–Cr–C, DeePMD MLIPs trained on spin-unpolarized (DP-NM) versus spin-polarized (DP-M) DFT show a dichotomy: DP-M captures static properties (volume, lattice constants) in Fe-rich regimes, while DP-NM better predicts dynamic and collective properties (melting temperatures, viscosity), reflecting the distinction between local static moments and space-time self-averaged high-T paramagnetism. Transfer learning from DP-NM to DP-M reduces computational cost for magnetic MLIPs by over an order of magnitude (Khazieva et al., 25 Jul 2025).
Materials design and scale bridging: On-the-fly learned MLIPs (via Bayesian KR regression or MACE) for Al–Mg–Zr alloys enable accurate elastic constant prediction (deviation within a few GPa of ultrasonic measurements) for supercells with up to 2048 atoms, supporting systematic phase-space exploration and foundational input for multiscale modeling (Volkmer et al., 8 Aug 2025).

5. Trade-offs: Model Complexity, Cost, Accuracy, and Usability

MLIP design balances three critical axes: prediction fidelity, computational cost, and user accessibility.

Model complexity and DFT precision: For a given descriptor set (e.g. qSNAP parameter $2J_\mathrm{max}$ ), increased model complexity (higher descriptor order) improves accuracy but raises evaluation cost and requires correspondingly higher DFT reference precision to avoid overfitting noise. Pareto front analysis formalizes the trade-off between cost and fidelity, recommending medium-precision DFT and optimal descriptor order for specific applications (Baghishov et al., 6 Jun 2025).
Energy vs force weighting: Because each configuration has one energy and $3N$ force components, loss function weights must be tuned to ensure appropriate influence; empirical results show that higher relative force weights smooth out noise in low-precision DFT and can boost both energy and force accuracy on high-precision tests (Baghishov et al., 6 Jun 2025).
Hardware acceleration: GPU-accelerated implementations (via JAX-MD, PyTorch, LAMMPS-KOKKOS) yield $10$x–$100$x speedups, rendering even large-message-passing or equivariant models competitive with classical force fields for MD—provided data movement and parallelization are properly managed (Leimeroth et al., 5 May 2025, Brunken et al., 28 May 2025).
Model/Software accessibility: Libraries such as MLIP (featuring MACE, NequIP, ViSNet models) deliver modular training, pretrained models, and plug-and-play MD engines compatible with ASE and JAX-MD, lowering the barrier for domain-specific users and developers (Brunken et al., 28 May 2025). Composable, FIM-guided design strategies offer systematic pathways to tune complexity and extensibility for diverse material spaces (Wang et al., 27 Apr 2025).

6. Future Directions and Open Problems

Current research trajectories in MLIPs include:

Universal and multi-fidelity potentials: Data-efficient multi-fidelity learning enables simultaneous incorporation of low- and high-fidelity data, critical for universal MLIPs or when high-level ab initio data are scarce (Kim et al., 12 Sep 2024). Iterative pretraining frameworks (IPIP) with forgetting mechanisms further mitigate local minima and data bias, achieving $>80\%$ error reduction over general-purpose force fields and $4\times$ simulation speedup on challenging systems (e.g., Mo–S–O) (Cui et al., 27 Jul 2025).
Explicit quantum effects and NQE: For protonic, hydrogen-bonded, or light-element systems, MLIP workflows now integrate path integral MD (PIMD) and quantum benchmark corrections (e.g., DMC) for NQE and "chemical accuracy" in condensed phases (Pia et al., 21 Feb 2025).
Spin-lattice coupled and long-range effects: The extension to systems with explicit charge transfer or spin degrees of freedom is realized via charge-equilibration-enabled equivariant MLIPs (Maruf et al., 23 Mar 2025), transfer-learning protocols for magnetic subspaces (Khazieva et al., 25 Jul 2025), and incorporation of global and local electronic predictors.
Active learning and uncertainty quantification: AL schemes based on ensemble or Bayesian uncertainties are critical for ensuring reliability in anharmonic or rare-event regimes, where conventional test-set errors are insufficient proxies for true out-of-sample behavior (Kang et al., 18 Sep 2024).
Transferability and extrapolability: The challenge of generalization is being addressed via theoretical error propagation frameworks, weighted multi-property loss functions, and dedicated benchmarking of outlier/rare events across complex chemical spaces (Ortner et al., 2022, Leimeroth et al., 5 May 2025, Choyal et al., 2023).

MLIPs have become the enabling technology for large-scale, high-accuracy atomistic simulation of materials, molecules, and interfaces. Best practices in data selection, model design, and uncertainty quantification are being continuously refined to meet the demands of chemical complexity, physical fidelity, and practical computation. The field is advancing rapidly towards foundational, fully transferable, and physically robust interatomic potentials, with continuous integration of methodological innovations and cross-disciplinary benchmarks.