Tabulated Gaussian Approximation Potentials (tabGAP)
- Tabulated Gaussian Approximation Potentials (tabGAP) are machine-learned interatomic potentials that reconcile DFT accuracy with classical speed by tabulating low-dimensional GAP descriptors.
- They replace expensive kernel summations with fast cubic spline interpolation over precomputed 1D and 3D grids, achieving speedups of 80× to 400× with minimal accuracy loss.
- tabGAP leverages diverse DFT datasets and descriptor decomposition (two-body, three-body, and EAM) to reliably model energies, forces, and defect dynamics in complex materials.
Tabulated Gaussian Approximation Potentials (tabGAP) constitute a class of machine-learned interatomic potentials designed to reconcile ab initio accuracy with the computational efficiency required for large-scale atomistic simulations. tabGAP is derived by tabulating the contributions of a low-dimensional Gaussian Approximation Potential (GAP)—typically restricted to two-body, three-body, and sometimes embedded-atom (EAM) density terms—onto regular grids, enabling energy and force evaluations through fast spline interpolation rather than runtime kernel summation. This methodology yields ≥100× speed-ups over full GAP implementations with only minor loss of accuracy on structural, defect, and dynamical properties, as demonstrated across metals, oxides, alloys, and complex functional materials (Dominguez-Gutierrez et al., 2022, Byggmästar et al., 2022, Zhao et al., 2022, Fellman et al., 2024, Eugenio et al., 27 Nov 2025, Dicksona et al., 30 Dec 2025, Makkonen et al., 26 Nov 2025, Koskenniemi et al., 2022).
1. Mathematical Formulation and Descriptor Decomposition
The original GAP formalism expresses the total potential energy as a sum over local atomic energies, each modeled via Gaussian process regression (GPR) on a set of low-dimensional environment descriptors. The key components are:
- Regression Framework:
where is a new atomic environment (described by a vector of descriptors), is the set of sparse reference environments, the vector of DFT energies/forces, the kernel matrix, and the assumed noise variance. The most used kernel is the squared-exponential (SE) kernel:
with controlling variance, the length scale.
- Descriptor Decomposition:
- Two-body: Pair distances 0.
- Three-body: Triplet configurations 1.
- Embedding-like: Local atomic density 2, typically with 3 a smooth function.
Each descriptor channel has its own kernel and set of regression weights.
2. Tabulation Strategy and Spline Construction
To avoid the 4 runtime cost per atomic environment in standard GAP, tabGAP precomputes the regression predictions on regular grids in descriptor space:
- 1D Tables (Two-body, EAM): For each relevant element-pair, energies (and sometimes density functions) are tabulated over a grid of 5 (e.g., 500–5,000 points).
- 3D Tables (Three-body): For each element triplet, energies are stored over a uniformly spaced grid in 6 (typically 7–8 points).
- Interpolation: Cubic-spline interpolation is employed for both 1D and 3D tables, guaranteeing 9 continuity (continuous energy, force, and second derivatives) (Dominguez-Gutierrez et al., 2022).
At runtime, local descriptors are mapped to table indices and interpolated; analytic spline derivatives yield forces.
3. Training Databases and Point Selection
tabGAP inherits its expressiveness from the quality and diversity of its GAP training database, which must include:
- Bulk phases over varied volumes and configurations: e.g., ±5–10% strain, multiple polymorphs for oxides or alloys.
- Surfaces, defective states, and high-energy snapshots: e.g., DFT of surfaces [(001), (110), ...], point and extended defects (vacancies, SIAs, interstitials), and snapshots from high-temperature or radiation events (Dominguez-Gutierrez et al., 2022, Fellman et al., 2024, Zhao et al., 2022, Dicksona et al., 30 Dec 2025).
- Liquids and non-equilibrium configurations: e.g., melt-quench or finite-T MD structures.
Representative environments (sparse points) for GPR are selected using strategies such as farthest-point sampling, CUR decomposition, or k-means in descriptor space. For tabulation, uniform or adaptively sampled grids in each descriptor domain are constructed, with density chosen to achieve prescribed force/energy interpolation errors (<1 meV/Å, <0.01 eV/atom) (Eugenio et al., 27 Nov 2025).
4. MD Workflow and Efficient Implementation
Each MD step with tabGAP involves:
- Neighbor List Construction within species-appropriate cutoffs for each descriptor channel.
- Spline Lookups for all pairwise and three-body (and EAM) contributions:
- For every pair 0, evaluate 1D spline at 1.
- For each triplet 2, evaluate 3D spline at 3.
- For each atom 4, accumulate density and call the embedding spline.
- Force Calculation: Analytic derivatives of the spline representation provide forces, ensuring smooth dynamics and energy conservation.
- Short-Range Physics: Explicit repulsive terms—frequently ZBL-type, with pair-specific parameters fitted to all-electron DFT dimers—are tabulated and smoothly merged into the short-range domain.
The per-atom, per-timestep computational scaling matches that of classical empirical potentials for the same cutoff range, with achievable 80–400× speedups over full GAP (Dominguez-Gutierrez et al., 2022, Dicksona et al., 30 Dec 2025, Zhao et al., 2022).
5. Accuracy, Validation, and Performance Benchmarks
tabGAP delivers near-DFT fidelity across a spectrum of materials systems and benchmarks:
| System (Reference) | Energy RMSE (meV/atom) | Force RMSE (meV/Å) | Speedup over GAP |
|---|---|---|---|
| W (Dominguez-Gutierrez et al., 2022) | ~2 | ~40 | ~5× |
| Cu, Al, Ni (Fellman et al., 2024) | 0.5–1.6 | 15–70 | 80–150× |
| YBa5Cu6O7 (Eugenio et al., 27 Nov 2025) | 3.3 | 90 | ~12× |
| 8-Fe-H (Makkonen et al., 26 Nov 2025) | 1.88 | 69 | 100–200× |
| Ga9O0 (Zhao et al., 2022) | ~11 | 160 | 400× |
| Mo-Nb-Ta-V-W (Byggmästar et al., 2022) | ~2 (crystal) | 60 (crystal) | 40× |
Key properties—lattice constants, elastic constants, stacking-fault energies, defect energetics, and threshold displacement energies—are reproduced within typical deviations of <1% for lattice/elastic properties, <0.02 eV for migration/barrier energies, and <10% for surface and stacking-fault energies relative to DFT or experiment (Dominguez-Gutierrez et al., 2022, Fellman et al., 2024, Dicksona et al., 30 Dec 2025, Eugenio et al., 27 Nov 2025, Makkonen et al., 26 Nov 2025).
6. Applications and Transferability
tabGAP is widely applied for:
- Large-scale MD simulations (≥101 atoms) of nanoindentation, defect dynamics, irradiation cascades, and mechanical loading (Dominguez-Gutierrez et al., 2022, Dicksona et al., 30 Dec 2025, Fellman et al., 2024).
- Materials chemistry: High-entropy alloys (HEAs), transition metals, complex oxides (e.g., exploration of Ga2O3 polymorphs (Zhao et al., 2022)).
- Radiation damage and high-energy processes: Accurate reproduction of defect creation, recombination, and clustering statistics in both metals (W, Mo, Fe) and superconducting oxides (YBa4Cu5O6) (Dicksona et al., 30 Dec 2025, Eugenio et al., 27 Nov 2025, Koskenniemi et al., 2022).
The combination of low-dimensional descriptors, tabulation, and spline interpolation yields a practical approach for extending GAP accuracy to multi-billion-atom, long-timescale MD required for materials design, screening, and process modeling.
7. Limitations, Implementation Considerations, and Future Directions
tabGAP performance is fundamentally constrained by the descriptor set:
- Descriptor Dimensionality: Only descriptors of ≤3 dimensions (e.g., 2b, 3b, EAM) are amenable to practical spline tabulation. High-dimensional descriptors (e.g., SOAP, message-passing) cannot be efficiently tabulated, limiting tabGAP’s ability to capture subtle symmetries or long-range charge/correlation effects (Zhao et al., 2022, Eugenio et al., 27 Nov 2025).
- Grid Density and Memory Usage: Detailed grids (e.g., 7 for three-body) require significant memory, and undersampling degrades force accuracy.
- Chemistry and Transferability: tabGAPs are only as transferable and robust as the diversity of their DFT training sets; substantial retraining is needed to accommodate new chemistries or environments (e.g., surfaces, interfaces, new phases) (Fellman et al., 2024, Eugenio et al., 27 Nov 2025).
Best practice recommends explicit inclusion of all relevant configurations and defects in the DFT dataset, careful selection of cutoff radii and grid sizes, and validation against higher-level calculations or experimental observables. For high-accuracy or strongly correlated systems, augmentation with higher-body or long-range descriptors remains necessary (often with a trade-off in performance) (Zhao et al., 2022).
tabGAP has proven a robust, reproducible, and computationally efficient approach for quantum-accurate large-scale atomistic simulation in diverse materials, broadly adopted and validated in recent machine-learning interatomic potential deployments.