Gaussian Approximation Potential (GAP)

Updated 12 May 2026

Gaussian Approximation Potential (GAP) is a machine-learned framework that uses Gaussian process regression to interpolate quantum mechanical potential energy surfaces with chemical accuracy.
It decomposes the total energy into local atomic contributions using symmetry-invariant descriptors like SOAP and employs sparsification for efficient training on ab initio data.
GAP delivers DFT-level performance for diverse systems—from crystals to liquids—enabling scalable, fast, and uncertainty-quantified atomistic simulations.

A Gaussian Approximation Potential (GAP) is a machine-learned interatomic potential framework that leverages Gaussian process regression to interpolate the Born–Oppenheimer potential energy surface (PES) of atomic and molecular systems from quantum mechanical data. GAP models have established themselves as a leading class of data-driven atomistic potentials capable of capturing chemical accuracy and broad transferability across crystalline, amorphous, molecular, and defective configurations, at computational costs orders of magnitude lower than direct ab initio calculations. Their defining characteristics are (1) a locality ansatz decomposing the total energy into a sum of local atomic contributions, (2) the use of symmetry-invariant descriptors such as SOAP or bispectrum for encoding atomic environments, and (3) the adoption of nonparametric Gaussian process regression over a (sparse) set of reference environments to learn these local energies from first-principles data.

1. Mathematical Foundations of GAP

In the GAP formalism, the total energy $E_{\rm tot}$ of a configuration of $N$ atoms is expressed as a sum of local atomic site energies: $E_{\rm tot} = \sum_{i=1}^N \varepsilon(\mathbf{x}_i)$ where $\mathbf{x}_i$ is a descriptor vector encoding the symmetry-invariant local environment of atom $i$ within a cutoff radius. Each site energy $\varepsilon(\mathbf{x}_i)$ is represented as a Gaussian process regression (GPR) over a set of $M$ “sparse” reference environments: $\varepsilon(\mathbf{x}_i) = \sum_{j=1}^{M} \alpha_j\,k(\mathbf{x}_i, \mathbf{x}_j)$ Here $k(\cdot,\cdot)$ is a positive-definite kernel measuring the similarity of two local environments, and the coefficients $\{\alpha_j\}$ are determined by solving a regularized linear system that simultaneously fits quantum mechanical data for total energies, forces, and (if available) stresses. The regression is formulated such that each observation corresponds to a linear combination of the unknown atomic energies and their derivatives, introducing a mapping (observation matrix $N$ 0) between local energies and DFT observables (Klawohn et al., 2023, Bartók, 2010, Bartók et al., 2015).

The kernel function $N$ 1 is typically chosen to be a squared-exponential or a polynomial (dot-product) kernel operating in the space of descriptors. For models employing higher body-order interactions or multi-species systems, the total energy can be decomposed into additive contributions (e.g., two-body, three-body, many-body/SOAP), with independent GPR models for each contribution. The full covariance between observables is correspondingly the sum of the covariances from each model (Bartók et al., 2015, Deringer et al., 2016).

The noise/regularization hyperparameters are incorporated in a diagonal matrix $N$ 2 quantifying the expected error in each observation (energies, forces, etc.), reflecting both quantum mechanical uncertainties and the finite fit resolution (George et al., 2020).

2. Atomic Environment Descriptors

A cornerstone of GAP is the representation of atomic environments as symmetry-invariant descriptors. Two families dominate:

Two-body, three-body, and n-body geometric descriptors: These include simple functions of interatomic distances (two-body), angular triples (three-body), and combinations thereof. Such descriptors are explicitly symmetrized under permutations of neighbors.
Smooth Overlap of Atomic Positions (SOAP): SOAP constructs a continuous neighbor density for each atom as a sum of Gaussians centered on nearby atoms:

$N$ 3

This density is expanded in a basis of radial functions $N$ 4 and spherical harmonics $N$ 5. Rotationally and permutationally invariant “power spectrum” components

$N$ 6

are formed and typically normalized, yielding a SOAP vector for each atom (Klawohn et al., 2023, Deringer et al., 2016, Bartók et al., 2015, Bernstein, 2024). The SOAP kernel is a (possibly polynomial) function of the dot product between normalized SOAP vectors, often raised to a sharpness exponent:

$N$ 7

Extensions to the basic SOAP formalism allow for the inclusion of multiple atomic species, through species-resolved densities and power spectra, as well as “compression” schemes to mitigate the unfavorable scaling with species count (Klawohn et al., 2023, Bernstein, 2024). Other descriptors (e.g., localized Coulomb matrices) have been shown to be competitive in specific contexts (Barker et al., 2016).

These descriptors guarantee invariance to global translation, rotation, and permutation of like atoms, which is critical for transferability and robustness (Klawohn et al., 2023, Bartók et al., 2015).

3. Model Training, Sparsification, and Regularization

The training database must span the relevant configuration space and typically comprises energies, atomic forces, and virial stresses computed from quantum mechanical calculations (DFT, coupled cluster, etc.) for diverse structures: bulk, surfaces, point and extended defects, clusters, liquids, and amorphous phases, depending on the target application (Unruh et al., 2021, Jana et al., 2023, Deringer et al., 2016).

Because a full database may contain millions of atomic environments, computational feasibility is achieved by sparsification: selecting a subset $N$ 8 of “inducing” or “sparse” points representing the diversity of local environments. CUR decomposition, farthest-point sampling in descriptor space, or unsupervised clustering are employed (Klawohn et al., 2022, Klawohn et al., 2023). Training then involves a linear solve with $N$ 9 scaling, and the cost of a single prediction is $E_{\rm tot} = \sum_{i=1}^N \varepsilon(\mathbf{x}_i)$ 0 (Klawohn et al., 2023).

Regularization (“expected errors”) is essential to balance fidelity to the training data and avoidance of overfitting. Adaptive regularization schemes that assign tolerance based on force magnitudes or structural complexity have been shown to improve transferability, e.g., fitting phonon spectra while retaining accuracy in liquids (George et al., 2020).

All major GAP implementations support active-learning workflows (variance-driven data selection), multi-body decomposition, scalable fitting (MPI-parallel), and transparent hyperparameterization (Klawohn et al., 2022, Klawohn et al., 2023, Zhang et al., 2022).

4. Practical Performance, Transferability, and Limitations

Across diverse material classes, GAP models achieve sub-meV/atom energy root mean square error (RMSE) and force RMSEs in the 0.04–0.13 eV/Å range with respect to DFT, sometimes approaching the theoretical locality limit imposed by the finite descriptor cutoff (Unruh et al., 2021, Jana et al., 2023, Byggmästar et al., 2020, Banik et al., 2024). Quantitative agreement is routinely reported for:

Lattice constants, elastic moduli, and phonon spectra matching DFT within experimental uncertainty (George et al., 2020, Unruh et al., 2021, Szlachta, 2014).
Defect formation energies, migration barriers, and surface energies (Szlachta, 2014, Deringer et al., 2016, Shenoy et al., 2023).
Liquid/amorphous structure and dynamics, including radial/angular distribution functions, glass transition, void distribution, and vibrational density of states (Unruh et al., 2021, Deringer et al., 2016).
Nanoparticle energetics, structural motifs, and melting behavior (Jana et al., 2023, Banik et al., 2024).
Thermomechanical properties including melting curves at extreme pressures (Byggmästar et al., 2020).

The SOAP-based GAP has proven particularly effective for materials with significant many-body interactions, angular character, or disorder. The transferability of a GAP model is governed by the diversity of the training set: extrapolation to out-of-sample structures is flagged by increases in the GPR predictive variance, and mitigation requires active enrichment of the database (Zhang et al., 2022, Klawohn et al., 2023). GAP performance can deteriorate for long-range interactions (electrostatics, dispersion) unless explicitly included via auxiliary schemes, or for system types far removed from the training data (Unruh et al., 2021, Bernstein, 2024).

For multi-species or magnetic systems, descriptor compression and model extensions (e.g., spin-channel SOAP descriptors) enable tractable scaling and retention of physical distinctions (e.g., magnetic ordering in steel alloys) (Klawohn et al., 2023, Shenoy et al., 2023).

5. Computational Implementation and Scalability

The canonical GAP implementation is embedded in the QUIP/QUIPPY codebase, with interfaces to leading simulation engines (ASE, LAMMPS, CP2K, CASTEP) (Klawohn et al., 2023). Recent innovations in parallelization (MPI+OpenMP, ScaLAPACK-QR) allow the fitting of potentials to databases with millions of reference environments on large CPU clusters, removing the single-node memory ceiling (Klawohn et al., 2022). Descriptor evaluation and linear algebra stages are efficiently distributed, with memory and wall-time scaling shown to be nearly optimal up to thousands of cores.

Descriptor compression (“alchemical embeddings,” tensor-reduction, “soap_turbo”) allows deployment on systems with large numbers of chemical elements, reducing the SOAP feature size while preserving accuracy (Klawohn et al., 2023). For typical parameterizations (SOAP n_max, l_max ≈ 8, ζ=4, 3000–8000 sparse points), single-point energy and force evaluations are in the 0.04–2 ms/atom range, yielding MD throughput on par with the most efficient empirical potentials (Bernstein, 2024).

6. Extensions, Applications, and Benchmarking

Beyond elemental and stoichiometric solids, GAP has been extended and validated for:

Multi-component alloys (Ag–Pd, Mo–Nb–Ta–V–W, Fe–Cr–Ni) (Rosenbrock et al., 2019, Shenoy et al., 2023, Jana et al., 2023)
Host–guest systems (e.g., Li intercalation in carbon): difference-GAP models augment a pre-trained host potential with a guest-specific GAP and explicit guest–guest analytic term, achieving faithful energetics and dynamics in battery materials (Fujikake et al., 2017).
Coarse-grained molecular simulations (GAP-CG): GPR applied to monomer/dimer/trimer descriptors reproduces many-body PMFs, surpasses site-based pair models in accuracy, and delivers significant speedup for biomolecular simulations (John, 2016).
Vibrational and phononic modeling (phonon spectra, thermal conductivity): adaptive regularization and focused augmentation of the training set enable DFT-level phonon prediction without loss of transferability to non-crystalline and anharmonic regimes (George et al., 2020).
Machine learning, uncertainty quantification, and integration with higher-level quantum or neural expansion frameworks such as ACE and MACE (Bernstein, 2024).

The principal benchmarks confirm that SOAP-GAP achieves DFT-level errors for energies, forces, elastic constants, phonons, and defect properties, but with evaluation speedups of $E_{\rm tot} = \sum_{i=1}^N \varepsilon(\mathbf{x}_i)$ 1– $E_{\rm tot} = \sum_{i=1}^N \varepsilon(\mathbf{x}_i)$ 2 compared to direct ab initio MD (Szlachta, 2014, Klawohn et al., 2023). In alloys and liquid/heterogeneous systems, GAP outperforms empirical and lower-order machine learned potentials in both transferability and physical extrapolation (Rosenbrock et al., 2019, Zhang et al., 2022).

7. Limitations and Comparative Perspective

GAP's strengths—quantum-level accuracy, systematic improvability, Bayesian uncertainty, rigorous regression, and robust symmetry—are balanced by certain limitations:

The strict locality assumption inhibits the description of long-range Coulomb or vdW interactions unless specialized terms are included (Unruh et al., 2021).
Computational overhead compared to polynomial or linear models (e.g., MTP, ACE) becomes significant at very large system sizes or when narrow cutoffs or fine SOAP expansions are needed (Bernstein, 2024).
The accuracy in extrapolative regimes is governed by the physics and coverage of the training set; unphysical predictions can occur if such regimes are sampled poorly (Zhang et al., 2022).
Hyperparameter selection can be delicate, especially for multi-descriptor, multi-species models; “best practices” recommend cross-validation, marginal likelihood maximization, or physical heuristics (Klawohn et al., 2023).
For many-body and disordered systems, descriptor and kernel choice directly control the upper limit of achievable accuracy, as evidenced by locality tests (Deringer et al., 2016).

Comparatively, GAP remains the reference standard for nonparametric, fully many-body ML potentials, but high-speed linear-basis expansions (ACE) and message-passing architectures (MACE) increasingly close the accuracy–efficiency gap (Bernstein, 2024). The distinctive flexibility, uncertainty quantification, and systematic Bayesian regression of GAP anchor its ongoing role in the advancement of atomistic simulation.

Key references:

(Bartók, 2010, Bartók et al., 2015, Klawohn et al., 2023, Bartók et al., 2015, Klawohn et al., 2023, Bernstein, 2024, Szlachta, 2014, Zhang et al., 2022, Unruh et al., 2021, Jana et al., 2023, Banik et al., 2024, Deringer et al., 2016, Fujikake et al., 2017, Byggmästar et al., 2020, Klawohn et al., 2022, Rosenbrock et al., 2019, Barker et al., 2016, John, 2016, Shenoy et al., 2023, George et al., 2020)

For detailed formulae, protocols, and implementation specifics, see the cited works and the GAP software documentation (Klawohn et al., 2023, Klawohn et al., 2022).