Papers
Topics
Authors
Recent
Search
2000 character limit reached

ML Interatomic Potentials

Updated 16 January 2026
  • Machine-learning interatomic potentials are mathematical models that approximate potential energy surfaces for atomistic simulations with near first-principles accuracy.
  • These models use symmetry-preserving descriptors and advanced regression techniques, such as graph neural networks and Gaussian approximation potentials, to capture atomic interactions.
  • Robust MLIPs integrate diverse training data and physics-informed strategies to balance accuracy, transferability, and computational efficiency across various material systems.

Machine-learning interatomic potentials (MLIPs) are mathematical models designed to approximate potential energy surfaces for atomic-scale simulations with near first-principles accuracy and computational efficiency orders-of-magnitude above ab initio quantum methods. MLIPs form the backbone of modern molecular dynamics, structural optimization, and materials design workflows, enabling predictive, data-driven simulation of inorganic, organic, and disordered systems from the nanometer to micrometer scale. The rapid evolution of MLIP formalisms—including symmetry-preserving local descriptors, differentiable regression architectures, and physics-informed training strategies—has catalyzed breakthroughs in transferability, data efficiency, and high-throughput applications across chemistry, condensed matter, and metallurgy.

1. Core Theory and Descriptor Formulations

MLIPs decompose the total system energy into a sum over atom-centered energies:

Etot=∑i=1NEi(Xi)E_{\mathrm{tot}} = \sum_{i=1}^{N} E_i(\mathcal{X}_i)

where Xi\mathcal{X}_i encodes the local atomic environment within a cutoff rcr_c as a high-dimensional descriptor vector. The most widespread descriptor families include:

Descriptor dimensionality, radial/angle cutoff radii, and symmetry constraints are hyperparameterized per chemical system and model architecture.

2. Model Classes and Regression Frameworks

MLIP architectures vary by complexity, data efficiency, computational cost, and extrapolative robustness:

Regression typically targets a weighted loss over energy, force, and stress errors:

L=wE 1NE∑(Epred−Eref)2+wF 13NF∑∥Fi,pred−Fi,ref∥2L = w_E\,\frac{1}{N_E} \sum (E_{\mathrm{pred}}-E_{\mathrm{ref}})^2 + w_F\,\frac{1}{3N_F} \sum \|\mathbf{F}_{i,\mathrm{pred}}-\mathbf{F}_{i,\mathrm{ref}}\|^2

with additional regularization on model parameters.

3. Training Data Generation and Strategies

Construction of robust MLIPs demands diverse reference datasets sampled from quantum-mechanical calculations (typically DFT, but also CCSD(T)/CBS, r2^2SCAN, etc.):

  • Static structures: Crystals, surfaces, defects, and alloys sampled over strain, volume, and composition (Pandey et al., 2022, Fellman et al., 2024).
  • Ab initio MD snapshots: Capturing vibrational, thermal, and liquid configurations at a range of temperatures (Pandey et al., 2022, Gong et al., 16 Aug 2025).
  • Active learning: On-the-fly relaxation and D-optimality selection to maximize training set informativeness while minimizing ab initio labeling (Gubaev et al., 2018).
  • Multi-fidelity learning: Joint training on multiple levels of theory (e.g., PBE, meta-GGA, CCSD(T)), leveraging abundant low-fidelity data with minimal high-fidelity coverage for optimal accuracy (Kim et al., 2024).
  • Ensemble knowledge distillation: Generating synthetic force labels via ensemble predictions when original QC datasets include only energies (Matin et al., 18 Mar 2025).

Dataset design principles prioritize phase, composition, and environment diversity to ensure interpolation robustness and controlled extrapolation risk.

4. Accuracy Metrics, Computational Cost, and Model Selection

Benchmarking frameworks evaluate MLIPs across:

Pareto optimization balances ultimate accuracy, cost, and usability; nonlinear ACE, MACE, Allegro, and NequIP occupy dominant speed/accuracy frontiers for complex materials (Leimeroth et al., 5 May 2025).

5. Physics-Informed and Weakly Supervised Extensions

Recent advances address deficiencies in generalization and conservative force prediction:

  • Physics-Informed Weakly Supervised Learning (PIWSL): Incorporates Taylor-expansion consistency and path-independence (PITC, PISC losses) into MLIP training. This enforces local energy-force response and conservative forces, yielding up to 2.6×2.6\times reduction in energy errors and 10−30%10-30\% reduction in force errors under data scarcity. PIWSL also supports training without direct force reference, enabling fine-tuning on high-level energies only (Takamoto et al., 2024).
  • Multi-fidelity and ensemble distillation: Enables MLIPs trained on partial or weak labels—e.g., only energies via ensemble force distillation—to reach near benchmark performance and enhanced MD stability (Matin et al., 18 Mar 2025, Kim et al., 2024).

Integration with modern toolkits (Open Catalyst, DeePMD-kit) and robust hyperparameter tuning (e.g., Optuna) are critical for deployment.

6. High-Throughput Materials Design, Universal MLIPs, and Domain-Specific Impact

MLIPs unlock computationally intractable workflows in materials discovery and chemistry:

  • High-entropy/alloy screening and elastic/mechanical optimization: MTP-MLIPs combined with automated composition sampling enable rational design of alloys and direct property mapping, achieving near-DFT accuracy for bulk and mechanical constants (Pandey et al., 2022, Byggmästar et al., 2022).
  • Universal MLIPs: Attention-based GNN potentials (DPA-Semi, CHGNet, M3GNet-DIRECT, MACE-MP-0, ALIGNN-FF) generalize across hundreds of elements and crystal prototypes with no per-system retraining (Liu et al., 2023, Yu et al., 2024).
  • Disordered and amorphous systems: Fine-tuning universal models (CHGNet) on amorphous alloy datasets yields transferable MLIPs accurately predicting density, EE, TgT_g, Young's modulus and enabling direct composition–property mapping (Gong et al., 16 Aug 2025).
  • Ferroelectric, phase-change, and molecular systems: Minimalist MLIPs and body-order complete GNNs reproduce complex phase transitions, topologies, and nonlinear effects even on sparse, default-data regimes (Robredo-Magro et al., 21 Nov 2025, Wen et al., 2024).

Universal MLIP capabilities are rapidly expanding with transfer-learning, attention weighting, and comprehensive data benchmarks, enabling foundation potentials for the periodic table (Liu et al., 2023, Yu et al., 2024).

7. Emerging Directions, Limitations, and Best Practices

Current challenges and active frontiers include:

  • Extrapolation risk management: Out-of-domain prediction remains vulnerable to unphysical output; active learning and physics-based constraints partially mitigate (Takamoto et al., 2024, Mishin, 2021).
  • Incomplete conservative force enforcement: While curl reduction is possible, full path-independence and global energy–force consistency pose an open problem [240
Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Machine-Learning Interatomic Potentials.