Papers
Topics
Authors
Recent
Search
2000 character limit reached

Tabulated Gaussian Approximation Potentials (tabGAP)

Updated 10 May 2026
  • Tabulated Gaussian Approximation Potentials (tabGAP) are machine-learned interatomic potentials that reconcile DFT accuracy with classical speed by tabulating low-dimensional GAP descriptors.
  • They replace expensive kernel summations with fast cubic spline interpolation over precomputed 1D and 3D grids, achieving speedups of 80× to 400× with minimal accuracy loss.
  • tabGAP leverages diverse DFT datasets and descriptor decomposition (two-body, three-body, and EAM) to reliably model energies, forces, and defect dynamics in complex materials.

Tabulated Gaussian Approximation Potentials (tabGAP) constitute a class of machine-learned interatomic potentials designed to reconcile ab initio accuracy with the computational efficiency required for large-scale atomistic simulations. tabGAP is derived by tabulating the contributions of a low-dimensional Gaussian Approximation Potential (GAP)—typically restricted to two-body, three-body, and sometimes embedded-atom (EAM) density terms—onto regular grids, enabling energy and force evaluations through fast spline interpolation rather than runtime kernel summation. This methodology yields ≥100× speed-ups over full GAP implementations with only minor loss of accuracy on structural, defect, and dynamical properties, as demonstrated across metals, oxides, alloys, and complex functional materials (Dominguez-Gutierrez et al., 2022, Byggmästar et al., 2022, Zhao et al., 2022, Fellman et al., 2024, Eugenio et al., 27 Nov 2025, Dicksona et al., 30 Dec 2025, Makkonen et al., 26 Nov 2025, Koskenniemi et al., 2022).

1. Mathematical Formulation and Descriptor Decomposition

The original GAP formalism expresses the total potential energy as a sum over local atomic energies, each modeled via Gaussian process regression (GPR) on a set of low-dimensional environment descriptors. The key components are:

  • Regression Framework:

E^(x)=k(x,X)[K+σn2I]1y\hat{E}(x_*) = k(x_*, X)\left[K + \sigma_n^2 I\right]^{-1} y

where xx_* is a new atomic environment (described by a vector of descriptors), XX is the set of MM sparse reference environments, yy the vector of DFT energies/forces, KK the kernel matrix, and σn2\sigma_n^2 the assumed noise variance. The most used kernel is the squared-exponential (SE) kernel:

kSE(x,x)=σf2exp(xx222)k_{\text{SE}}(x, x') = \sigma_f^2 \exp\left(-\frac{\|x - x'\|^2}{2\ell^2}\right)

with σf\sigma_f controlling variance, \ell the length scale.

  • Descriptor Decomposition:
    • Two-body: Pair distances xx_*0.
    • Three-body: Triplet configurations xx_*1.
    • Embedding-like: Local atomic density xx_*2, typically with xx_*3 a smooth function.

Each descriptor channel has its own kernel and set of regression weights.

2. Tabulation Strategy and Spline Construction

To avoid the xx_*4 runtime cost per atomic environment in standard GAP, tabGAP precomputes the regression predictions on regular grids in descriptor space:

  • 1D Tables (Two-body, EAM): For each relevant element-pair, energies (and sometimes density functions) are tabulated over a grid of xx_*5 (e.g., 500–5,000 points).
  • 3D Tables (Three-body): For each element triplet, energies are stored over a uniformly spaced grid in xx_*6 (typically xx_*7–xx_*8 points).
  • Interpolation: Cubic-spline interpolation is employed for both 1D and 3D tables, guaranteeing xx_*9 continuity (continuous energy, force, and second derivatives) (Dominguez-Gutierrez et al., 2022).

At runtime, local descriptors are mapped to table indices and interpolated; analytic spline derivatives yield forces.

3. Training Databases and Point Selection

tabGAP inherits its expressiveness from the quality and diversity of its GAP training database, which must include:

  • Bulk phases over varied volumes and configurations: e.g., ±5–10% strain, multiple polymorphs for oxides or alloys.
  • Surfaces, defective states, and high-energy snapshots: e.g., DFT of surfaces [(001), (110), ...], point and extended defects (vacancies, SIAs, interstitials), and snapshots from high-temperature or radiation events (Dominguez-Gutierrez et al., 2022, Fellman et al., 2024, Zhao et al., 2022, Dicksona et al., 30 Dec 2025).
  • Liquids and non-equilibrium configurations: e.g., melt-quench or finite-T MD structures.

Representative environments (sparse points) for GPR are selected using strategies such as farthest-point sampling, CUR decomposition, or k-means in descriptor space. For tabulation, uniform or adaptively sampled grids in each descriptor domain are constructed, with density chosen to achieve prescribed force/energy interpolation errors (<1 meV/Å, <0.01 eV/atom) (Eugenio et al., 27 Nov 2025).

4. MD Workflow and Efficient Implementation

Each MD step with tabGAP involves:

  1. Neighbor List Construction within species-appropriate cutoffs for each descriptor channel.
  2. Spline Lookups for all pairwise and three-body (and EAM) contributions:
    • For every pair XX0, evaluate 1D spline at XX1.
    • For each triplet XX2, evaluate 3D spline at XX3.
    • For each atom XX4, accumulate density and call the embedding spline.
  3. Force Calculation: Analytic derivatives of the spline representation provide forces, ensuring smooth dynamics and energy conservation.
  4. Short-Range Physics: Explicit repulsive terms—frequently ZBL-type, with pair-specific parameters fitted to all-electron DFT dimers—are tabulated and smoothly merged into the short-range domain.

The per-atom, per-timestep computational scaling matches that of classical empirical potentials for the same cutoff range, with achievable 80–400× speedups over full GAP (Dominguez-Gutierrez et al., 2022, Dicksona et al., 30 Dec 2025, Zhao et al., 2022).

5. Accuracy, Validation, and Performance Benchmarks

tabGAP delivers near-DFT fidelity across a spectrum of materials systems and benchmarks:

System (Reference) Energy RMSE (meV/atom) Force RMSE (meV/Å) Speedup over GAP
W (Dominguez-Gutierrez et al., 2022) ~2 ~40 ~5×
Cu, Al, Ni (Fellman et al., 2024) 0.5–1.6 15–70 80–150×
YBaXX5CuXX6OXX7 (Eugenio et al., 27 Nov 2025) 3.3 90 ~12×
XX8-Fe-H (Makkonen et al., 26 Nov 2025) 1.88 69 100–200×
GaXX9OMM0 (Zhao et al., 2022) ~11 160 400×
Mo-Nb-Ta-V-W (Byggmästar et al., 2022) ~2 (crystal) 60 (crystal) 40×

Key properties—lattice constants, elastic constants, stacking-fault energies, defect energetics, and threshold displacement energies—are reproduced within typical deviations of <1% for lattice/elastic properties, <0.02 eV for migration/barrier energies, and <10% for surface and stacking-fault energies relative to DFT or experiment (Dominguez-Gutierrez et al., 2022, Fellman et al., 2024, Dicksona et al., 30 Dec 2025, Eugenio et al., 27 Nov 2025, Makkonen et al., 26 Nov 2025).

6. Applications and Transferability

tabGAP is widely applied for:

The combination of low-dimensional descriptors, tabulation, and spline interpolation yields a practical approach for extending GAP accuracy to multi-billion-atom, long-timescale MD required for materials design, screening, and process modeling.

7. Limitations, Implementation Considerations, and Future Directions

tabGAP performance is fundamentally constrained by the descriptor set:

  • Descriptor Dimensionality: Only descriptors of ≤3 dimensions (e.g., 2b, 3b, EAM) are amenable to practical spline tabulation. High-dimensional descriptors (e.g., SOAP, message-passing) cannot be efficiently tabulated, limiting tabGAP’s ability to capture subtle symmetries or long-range charge/correlation effects (Zhao et al., 2022, Eugenio et al., 27 Nov 2025).
  • Grid Density and Memory Usage: Detailed grids (e.g., MM7 for three-body) require significant memory, and undersampling degrades force accuracy.
  • Chemistry and Transferability: tabGAPs are only as transferable and robust as the diversity of their DFT training sets; substantial retraining is needed to accommodate new chemistries or environments (e.g., surfaces, interfaces, new phases) (Fellman et al., 2024, Eugenio et al., 27 Nov 2025).

Best practice recommends explicit inclusion of all relevant configurations and defects in the DFT dataset, careful selection of cutoff radii and grid sizes, and validation against higher-level calculations or experimental observables. For high-accuracy or strongly correlated systems, augmentation with higher-body or long-range descriptors remains necessary (often with a trade-off in performance) (Zhao et al., 2022).

tabGAP has proven a robust, reproducible, and computationally efficient approach for quantum-accurate large-scale atomistic simulation in diverse materials, broadly adopted and validated in recent machine-learning interatomic potential deployments.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Tabulated Gaussian Approximation Potentials (tabGAP).