Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 167 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 31 tok/s Pro
GPT-5 High 31 tok/s Pro
GPT-4o 106 tok/s Pro
Kimi K2 187 tok/s Pro
GPT OSS 120B 443 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Machine Learning Potentials (MLPs)

Updated 12 November 2025
  • Machine Learning Potentials are data-driven surrogate models that simulate atomic interactions with quantum-mechanical accuracy and statistical scalability.
  • They employ information-theoretic motif sampling to uniformly cover diverse chemical environments, enhancing predictive transferability across compositions.
  • The PACE framework uses a linear atomic cluster expansion with advanced regularization to achieve efficient training and precise energy and force predictions.

Machine learning potentials (MLPs) are data-driven surrogate models that enable atomistic simulations with quantum-mechanical accuracy and statistical-mechanics scalability. MLPs have revolutionized structure/property prediction, dynamics, and phase diagram determination in an array of chemically complex and disordered materials, including multicomponent alloys. A central challenge in this domain is achieving predictive fidelity across the full compositional and structural landscape of such alloys, from perfectly ordered stoichiometric compounds to maximally disordered solid solutions. The integration of information-theoretic sampling strategies with advanced body-ordered descriptors—most notably, the Performant Atomic Cluster Expansion (PACE)—provides an effective means of constructing robust, transferable MLPs for alloys that span this range, as exemplified by recent developments in motif-based sampling (MBS) (Sheriff et al., 14 Jun 2025).

1. Information-Theoretic Sampling of Local Chemical Motifs

The critical link between the local environment of an atom and the macroscopic properties of a material motivates a formal definition of "motifs": each atom’s first coordination polyhedron, labeled by the species of its nearest neighbors, defines a discrete motif mm. In a database of NN local environments, the empirical motif frequency is

P(m)=#{occurrences of m}NP(m) = \frac{\#\{\text{occurrences of }m\}}{N}

The combinatorial space of possible motifs for a multicomponent system is vast, and uniform coverage is neither guaranteed nor trivial to realize with random structure enumeration, especially for rare or complex motifs associated with certain compositions or short-range orders.

To rectify sampling bias, the motif distribution P(m)P(m) is explicitly compared to the uniform reference U(m)=1/MU(m)=1/M (with MM the total number of motifs) using the Jensen–Shannon divergence,

DJS(PU)=12DKL(PM)+12DKL(UM)D_{\mathrm{JS}}(P\| U) = \frac{1}{2} D_{\mathrm{KL}}(P\| M) + \frac{1}{2} D_{\mathrm{KL}}(U\| M)

where M=12(P+U)M = \frac{1}{2}(P+U) and DKL(PQ)=mP(m)log2P(m)Q(m)D_{\mathrm{KL}}(P\|Q) = \sum_{m} P(m) \log_2 \frac{P(m)}{Q(m)}.

Maximizing the Shannon entropy H(P)=mP(m)log2P(m)H(P) = -\sum_m P(m) \log_2 P(m), which is equivalent to minimizing DJS(PU)D_{\mathrm{JS}}(P\|U), is operationalized through the motif-based sampling (MBS) algorithm:

  1. Generate an initial pool of alloy structures across the full composition simplex (e.g., fcc random substitutions).
  2. Within each structure, perform atom swaps to increase H(P)H(P).
  3. Iterate until P(m)P(m) is acceptably close to uniform across mm.

This approach ensures that even rare coordination environments—including those present in ordered or highly short-range-ordered regimes—are systematically incorporated into the training set.

2. Model Architecture: The Linear Atomic Cluster Expansion (PACE)

The PACE framework implements a linear atomic cluster expansion (ACE), which provides a systematically improvable, permutation- and rotation-invariant decomposition of the total energy: Etot=iEi,Ei=αcαΦα(Ri)E_{\mathrm{tot}} = \sum_i E_i, \quad E_i = \sum_\alpha c_\alpha \Phi_\alpha(\mathcal{R}_i) where Φα\Phi_\alpha are body-ordered cluster basis functions built from one- through nn-body descriptors of atom ii’s neighbor coordinates Ri={rij,r^ij,Zj}\mathcal{R}_i = \{r_{ij}, \hat{\mathbf{r}}_{ij}, Z_j\}. Each Φα\Phi_\alpha incorporates radial basis functions gn(r)g_n(r), spherical harmonics Ym(r^)Y_{\ell m}(\hat{\mathbf{r}}), and Chebyshev polynomials. For two-body terms,

Ai,nm(α)=jiδZi,Zjgn(rij)Ym(r^ij)A_{i,n\ell m}^{(\alpha)} = \sum_{j\ne i} \delta_{Z_i,Z_j} \, g_n(r_{ij}) Y_{\ell m}(\hat{\mathbf{r}}_{ij})

Higher-order (three-body and beyond) invariants are formed via tensor contractions of these Ai,A_{i,\ldots}. The PACE codebase automates construction and pruning of the resulting large basis set, ensuring computational tractability and expressivity.

3. Training Workflow and Regularization

PACE-based MLPs are trained by simultaneous matching of reference DFT energies and forces: L=strainwE[EspredEsDFT]2+wFsisFs,ipredFs,iDFT2\mathcal{L} = \sum_{s \in \text{train}} w_E [ E_s^{\text{pred}} - E_s^{\text{DFT}} ]^2 + w_F \sum_{s} \sum_{i \in s} \Vert \mathbf{F}_{s,i}^{\text{pred}} - \mathbf{F}_{s,i}^{\text{DFT}} \Vert^2 with wEw_E and wFw_F tuned so that energy and force residuals per atom contribute comparably in magnitude (empirically, wE102 eV2-atom1w_E \sim 10^{-2}\ \text{eV}^{-2}\text{-atom}^{-1}, wF101 (eV/A˚)2w_F \sim 10^{-1}\ (\text{eV}/\text{Å})^{-2}). Tikhonov (ridge) regularization on the coefficients cαc_\alpha prevents overfitting; λ106\lambda \sim 10^{-6}10510^{-5} is typically sufficient.

Key features of the workflow include isotropic expansion to mimic finite temperature (up to 1000 K), random displacements, and small strains to sample vibrational anharmonicity.

With MBS, the final dataset achieves a motif "packing density" (fraction of possible motifs present) up to 30% higher than with random sampling and achieves DJSD_{\mathrm{JS}} reductions of up to 0.12 bits (e.g., 0.48140.36610.4814 \rightarrow 0.3661 for CrCoNi with 702 structures).

4. Validation: Accuracy and Transferability across Compositions

Quantitative property prediction benchmarks illustrate the consequences of motif entropy sampling:

  • Energy/Force Accuracy (CrCoNi): MBS-trained MLPs achieve MAEE1\mathrm{MAE}_E \approx 1 meV/atom (<1<1 meV/atom variation, even at high SRO), compared with $3$–$5$ meV/atom and 5\sim5 meV variation for randomly sampled MLPs; force RMSE is consistently 0.05\lesssim 0.05 eV/Å.
  • Phase Diagrams: Predicted fcc–bcc transition boundaries for Cr–Ni and Cr–Co binaries match experiment and CALPHAD within 25–30 K; Au–Pt miscibility gap critical points deviate by only 5% from early experimental data.
  • Melting Points: For fcc CrCoNi, TmMLP=1641T_m^{\mathrm{MLP}} = 1641 K (3% below experiment). For bcc TaTiVW and derivatives, melting predictions are within 2–5% of experimental windows.
  • Short-Range Order and Lattice Expansion: Predicted Warren–Cowley αij\alpha_{ij} and a(T)a(T) for various compositions and temperatures are within experimental uncertainty (<0.02<0.02 in SRO, <0.3%<0.3\% in lattice parameter).
  • Thermodynamics: Specific heats (cpc_p) of TaTiVW alloys are predicted within 2–13% of high-temperature NIST and other measurements.
  • Fault Energies: Stacking-fault energies γSFE\gamma_{\mathrm{SFE}} in CrCoNi at 500 K map correctly and match first-principles trends as Cr is reduced.

Crucially, MBS-MLPs reduce composition-dependent variation of energy and force errors by two orders of magnitude compared to broader “universal” MLPs (MatterSim, Orb, MACE).

5. Computational Requirements and Scaling

The cost of motif-based sampling and dataset construction is negligible relative to reference quantum mechanical calculations. For example, a 702-structure MBS dataset can be built in less than 10% of the CPU time needed for a single-composition DFT–MC trajectory, yet delivers substantially improved transferability. PACE MLPs exhibit linear scaling (O(N)O(N)) with system size, and inference is efficient enough to support atomistic molecular dynamics of up to millions of atoms.

Dataset sizes for coverage (examples from the work):

  • Au–Pt: 200\sim200 configurations across stoichiometries and SRO.
  • Cu–Au: 150\sim150 configurations in analogous sampling.
  • Cr–Co–Ni: 66 (benchmark) and 702 (production) structures covering 12 compositions, multiple phases, and 20\sim20 stoichiometric compounds.
  • Ti–Ta–V–W: 250\sim250 configurations targeting equiatomic and derivative compositions.

6. Implications, Extensions, and Future Perspectives

The direct enforcement of motif uniformity achieves order-of-magnitude improvements in compositional transferability, enabling a single MLP to quantitatively predict properties from binary subsystems to high-entropy phases and even liquids without retraining. The motif-based approach is system-agnostic and direct application to other metallic multicomponent alloys, intermetallics, and even non-metallic systems is feasible, contingent on the definition of an appropriate motif alphabet.

Prospective extensions include integration with generative structure proposals to automate the exploration and inclusion of relevant metastable phases. The methodology lays a foundation for the systematic construction of interatomic potentials that are both data-efficient and physically robust across the full compositional and structural landscape—directly addressing a principal limitation of prior universal or database-trained models.

In summary, the combination of information-theoretic sampling (maximally uniform motif entropy) with robust, body-ordered ACE models (PACE) yields a practical and efficient strategy for constructing transferable, high-fidelity MLPs for alloys. These models simultaneously capture chemical diversity, structural complexity, and thermal perturbations, providing a rigorous foundation for predictive atomistic simulation across phase diagrams, property trends, and order–disorder regimes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Machine Learning Potentials (MLPs).