CHGNet uMLIP: Universal ML Interatomic Potential

Updated 15 October 2025

CHGNet uMLIP is a universal interatomic potential that employs deep graph neural networks with explicit electronic descriptors, including atomic magnetic moments, to capture complex interatomic interactions.
It delivers near DFT-level accuracy in energies, forces, and properties across a wide range of condensed matter systems, while achieving orders-of-magnitude computational speedup.
Fine-tuning with high-energy DFT data and transfer learning strategies enhance its performance in out-of-domain scenarios such as high-pressure, defected, and alloyed materials.

A universal machine learning interatomic potential (uMLIP) is a surrogate atomistic model parameterized using deep learning, trained on a diverse, multi-element dataset spanning a large slice of chemical and structural configuration space. CHGNet is a representative uMLIP, built on an advanced graph neural network architecture with explicit electronic descriptors—particularly atomic magnetic moments—allowing it to encode both geometric and electronic structure features. CHGNet’s universal aim is to provide DFT-level accuracy in energies, forces, and properties across broad classes of condensed matter systems, including bulk, surface, defected, alloy, and low-dimensional materials, while maintaining orders-of-magnitude computational speed-up over quantum mechanical calculations.

1. Model Architecture and Representations

CHGNet is a deep graph neural network (GNN) employing both atom and "bond" (three-body) graphs to capture interatomic interactions of arbitrary complexity. Its architecture differs from standard message-passing neural networks by incorporating not only spatial/geometric input features but also atomic magnetic moments, which are explicitly included in the node (atomic) feature vectors. This enables the model to describe magnetic and multivalent systems where such higher-level descriptors influence chemical bonding.

The total predicted energy, $E$ , is decomposed into contributions from elemental reference ("AtomRef") energies and the neural network-predicted many-body interactions: $E_{\text{total}} = \mathbf{c}_{\text{elem}} \cdot \mathbf{E}_{\text{elem}} + E_\text{GNN}$ where $\mathbf{E}_{\text{elem}}$ are per-element reference values and $E_{\text{GNN}}$ is the learned contribution from the GNN. Forces are computed as

$\mathbf{f}_i = -\dfrac{\partial E}{\partial \mathbf{r}_i}$

The inclusion of additional channels such as site magnetic moments allows CHGNet to better encode charge and magnetic state information, providing an edge in systems with nontrivial electronic effects, such as transition metals, magnetic materials, and certain alloys (Focassio et al., 7 Mar 2024, Yu et al., 8 Mar 2024, Casillas-Trujillo et al., 25 Jun 2024).

2. Generalization: Performance and Challenges

CHGNet demonstrates good transferability and accuracy in equilibrium and near-equilibrium configurations, such as bulk structures, where it achieves low root mean square errors (RMSE) in energy and force prediction (e.g., bulk total energy RMSE ≈ 0.079 eV/atom; surface total energy RMSE ≈ 0.039 eV/atom for selected elements) (Focassio et al., 7 Mar 2024). However, for derived properties that depend on the difference between bulk and perturbed environments—such as surface energies, defect formation energies, or mixing enthalpies—performance is more nuanced.

For surface energy predictions, which require precise extrapolation to low-coordination/boundary environments, CHGNet underestimates surface energies (typical RMSE ≈ 0.51 J/m²), a phenomenon known as “potential energy surface softening.” This trend is observed across universal MLIP models and is attributed to the bulk-biased nature of training data—out-of-domain configurations are insufficiently sampled (Focassio et al., 7 Mar 2024, Deng et al., 11 May 2024).

Prediction of mixing enthalpies in alloys, where the property of interest is a small energy difference (on the order of 10–50 meV/atom) between large absolute energies, is similarly sensitive to model accuracy: CHGNet often requires system-specific retraining to reproduce both sign and magnitude of the mixing enthalpy, since error cancellation is imperfect (Casillas-Trujillo et al., 25 Jun 2024).

CHGNet Task	RMSE / MAE	Key Finding
Bulk total energy	≈0.079 eV/atom	High accuracy relative to DFT
Surface total energy	≈0.039 eV/atom	Outperforms some peers on total, not difference
Surface energy	≈0.51 J/m²	Systematically underestimated (“softening”)
Alloy mixing enthalpy	10–50 meV/atom (typ.)	Needs retraining for chemically accurate signs
Phonon freq. (MAE)	≈89 K (ω_max)	Moderate; outperformed by larger models

The performance gap grows for systems further from the training set, e.g., high-pressure structures (Loew et al., 25 Aug 2025), strongly disordered alloys and glasses, or materials with substantial surface or defect content.

3. Systematic Softening and Fine-Tuning Solutions

A central weakness in universal MLIPs, including CHGNet, is the systematic "softening" of the potential energy surface (PES): both energies and forces are underpredicted when models are evaluated on high-energy or strongly out-of-equilibrium configurations. This originates from the dominance of near-equilibrium states in the pre-training dataset. The degree of softening is quantified by the “softening scale,” i.e., the slope of the model’s force predictions relative to DFT:

$f^{\text{CHGNet}} = s \, f^{\text{DFT}}, \quad 0 < s < 1$

Correcting this systematic error can be achieved with remarkable data efficiency. Applying a simple linear scaling (multiplicative correction) to the output energies/forces—with the scaling derived from as little as a single high-energy DFT reference—removes the bias:

$E^{\text{corr}} = c \cdot \text{MLIP}(\cdot), \qquad f^{\text{corr}} = c \cdot f$

Here, $c ≈ 1/s$ is determined by regression on a small OOD sample (Deng et al., 11 May 2024, Žguns et al., 10 Sep 2025). Empirical studies show that fine-tuning on ~100 DFT structures, especially from the relevant OOD regime, is sufficient to calibrate the force and stress parity plots (slopes move from ~0.6 to nearly 1), reduce force MAEs to the 100 meV/Å level, and bring MD-predicted observables (e.g., EXAFS spectra) into DFT or experimental agreement (Žguns et al., 10 Sep 2025).

4. Transfer Learning, Multi-Fidelity, and Dataset Considerations

Transfer learning and multi-fidelity workflows are essential for pushing CHGNet toward "true" universality. The main bottleneck in cross-functional (e.g., GGA→r²SCAN) transfer is the large and arbitrary offset in absolute energies between different DFT functionals (on the order of tens of eV/atom), leading to poor correlation and inefficient learning. This is addressed by explicitly aligning the elemental reference energies ("AtomRef" terms) before fine-tuning:

$E_\text{total}^{\text{scaled}} = E_\text{total} - \mathbf{c}_\text{elem} \cdot \mathbf{E}_\text{elem}^{\text{low-fidelity}} + \mathbf{c}_\text{elem} \cdot \mathbf{E}_\text{elem}^{\text{high-fidelity}}$

With appropriate referencing, transfer learning becomes stable and data-efficient: performance after fine-tuning with r²SCAN AtomRefs achieves significantly lower MAEs in energy, force, and formation energies, with training gradients an order of magnitude smaller (Huang et al., 7 Apr 2025). Scaling-law analysis demonstrates that a transfer-learned CHGNet matches or beats a scratch-trained model using ten times fewer high-fidelity data points.

Multi-fidelity learning can also be realized by learning the difference (Δ-learning) between low- and high-fidelity outputs, or by mixed training with explicit fidelity encoding (Huang et al., 7 Apr 2025). In global minimization—they serve as the foundation for active Δ-learning correction, where Gaussian Process Regression (GPR) on SOAP descriptors augments the surrogate model for robust energetic ordering during structural searches (Pitfield et al., 24 Jul 2025).

5. Applications in Alloys, Defects, and High-Throughput Workflows

CHGNet has been deployed on a wide range of materials challenges. In alloys, it enables high-throughput screening of formation energies, mixing enthalpies, and volumes for broad classes of binary systems. While not always matching DFT in small energy differences, targeted retraining enables chemical accuracy (≤10 meV/atom error) in challenging systems. As a structure "pre-relaxer," it expedites DFT workflows by obviating iterative ab initio relaxations—structure is first efficiently minimized by CHGNet, then a single-point DFT calculation is performed for the final energy (Casillas-Trujillo et al., 25 Jun 2024).

For defect modeling (vacancy formation energies, grain boundaries, etc.), large-scale screening is possible due to CHGNet’s low computational overhead and integration in atomistic modeling toolkits (e.g., ASE). Quantitative agreement with DFT defect energetics is demonstrated across thousands of materials, establishing CHGNet as suitable for high-throughput defect discovery and materials design (Berger et al., 9 Apr 2025).

In phase diagram calculations, CHGNet (and similar uMLIPs) are integrated with CALPHAD workflows by replacing DFT in the energy/free‐energy evaluation pipeline (e.g., via ATAT toolkit), enabling orders-of-magnitude acceleration while maintaining phase stability predictions at a useful level of accuracy (Zhu et al., 22 Nov 2024).

6. Dynamic and Vibrational Properties: Phonons and Diffusion

In vibrational property prediction, CHGNet's accuracy is intermediate among universal MLIPs. It achieves a mean absolute error in maximum phonon frequency of ≈89 K—higher than MatterSim or SevenNet, but superior to some models that decouple forces from energy derivatives (Loew et al., 21 Dec 2024). The critical factor limiting accuracy in phonon calculations is the ability of the model to predict forces that are the analytic (energy-consistent) derivatives, as errors are amplified in the second derivatives (Hessian) relevant for vibrational analyses.

Dynamic (diffusion-driven) properties are more challenging: in systematic benchmarks, CHGNet is outperformed by MatterSim and SevenNet in ionic conductivity (Li-ion diffusion) simulations. The key limiting factors are higher force errors (typically ≈70 meV/Å) and difficulties in maintaining accurate MD trajectories for complex diffusion mechanisms—this underscores the need for energy–force consistency and compositionally diverse training (Du et al., 14 Feb 2025).

7. Recent Extensions: Electrostatics, Scalability, and Uncertainty

Modern workflows extend CHGNet’s functionality along multiple axes. The Latent Ewald Summation (LES) framework augments CHGNet with long-range electrostatics by inferring latent atomic charges from local descriptors and computing Ewald sums, enabling improved accuracy in dielectric, polar, and interface systems without explicit charge training (Kim et al., 18 Jul 2025).

Distributed inference with DistMLIP employs graph-level partitioning (vs. spatial partitioning) to achieve near-linear scalability. CHGNet simulations with millions of atoms are feasible in seconds using 8 GPUs, with negligible loss of numerical accuracy—a major step toward application in realistic, experimentally sized systems (Han et al., 28 May 2025).

A universal uncertainty metric, $U$ , constructed by heterogeneous ensemble averaging across pre-trained uMLIPs (with proper RMSE weighting), provides configuration-level error estimation aligned to the true force prediction error (Spearman’s ρ ≈ 0.87 on diverse datasets). This enables uncertainty-aware distillation and fine-tuning, pseudo-label filtering, and cost-efficient, safe model development and autonomous simulation (Liu et al., 28 Jul 2025).

8. Limitations and Directions for Model and Dataset Improvement

CHGNet’s universality remains constrained by the limitations of its training data. High-pressure, far-from-equilibrium, surface-rich, and highly disordered or defected configurations are under-sampled, leading to deteriorated predictions in those regimes unless fine-tuning or retraining is performed (Loew et al., 25 Aug 2025, Focassio et al., 7 Mar 2024). Recent work demonstrates that enriching the training distribution (via active learning, high-pressure datasets, or aggressive data augmentation) substantially closes this gap, producing "true" universal models that robustly span the relevant configurational and chemical spaces.

The future path includes:

Expanding datasets to include high-pressure, surface/interface, and high-energy configurations.
Incorporating physically motivated terms (long-range electrostatics, polarization).
Prioritizing force accuracy and energy–force consistency in loss weighting.
Implementing robust transfer- and multi-fidelity learning pipelines for cross-functional generalization.
Leveraging uncertainty metrics for adaptive data acquisition and deployment monitoring.

In sum, CHGNet exemplifies the power and limitations of modern uMLIPs: with careful data engineering and model adaptation, it can provide DFT-level accuracy for a diverse array of materials problems, though ongoing research is focused on further broadening its true universality and reliability across all relevant materials regimes.