SOAP Descriptor: Atomic Environment Fingerprint

Updated 12 October 2025

SOAP Descriptor is a high-dimensional, symmetry-invariant representation that encodes local atomic environments using continuous density functions expanded in spherical harmonics.
It employs mathematical methods by projecting atom-centered densities onto radial functions and spherical harmonics, ensuring invariance to rotations, translations, and permutations.
Extensions such as TimeSOAP and Magnetic SOAP enable dynamic, magnetic, and neural network optimization applications, making it versatile for simulations and advanced analyses.

The Smooth Overlap of Atomic Positions (SOAP) descriptor is a high-dimensional, symmetry-invariant vector representation framework designed to encode the local geometry and, in its generalizations, functional and dynamic environments of atoms or structural units in molecules, materials, and other complex systems. SOAP has seen wide application in atomistic simulation, computational materials science, condensed matter physics, and has been extended for use in diverse areas such as machine-learning-driven interatomic potential fitting, phase transition analysis, magnetic structure recognition, neural network optimization, and automated clinical note generation. This entry surveys its mathematical formulation, algorithmic developments, key applications, limitations, and future perspectives.

1. Mathematical Formulation and Variants

At its core, SOAP constructs a continuous atom-centered density function, which is expanded in a suitable basis to create a vector “fingerprint” of the local environment. For an atom $i$ , the local neighbor density is defined as: $\rho_i(\mathbf{r}) = \sum_j \exp\left(-\frac{|\mathbf{r}-\mathbf{r}_{ij}|^2}{2\sigma^2}\right) f_{\text{rcut}}(|\mathbf{r}-\mathbf{r}_{ij}|)$ where $\mathbf{r}_{ij}$ are neighbor positions relative to atom $i$ , $\sigma$ is the Gaussian width, and $f_{\text{rcut}}$ a smooth cutoff. The density is projected onto orthonormal radial functions $R_n(r)$ and spherical harmonics $Y_{lm}(\hat{\mathbf{r}})$ to yield expansion coefficients; these coefficients are further contracted (typically into the “power spectrum”) to generate descriptors invariant to rotations, translations, and atom permutations.

Generalizations of SOAP include:

Time-dependent SOAP (TimeSOAP, tSOAP): Tracks temporal variations in SOAP vectors to characterize dynamic processes in molecular systems (Caruso et al., 2023).
Magnetic SOAP: Extends the scalar density to vector-valued “magnetization densities” smeared with Gaussians, and expands them in a basis of vector spherical harmonics to encode both atomic arrangement and magnetic configuration (Suzuki et al., 2023).
Gradient Whitening SOAP (Shampoo with Adam in Preconditioner Eigenbasis): Reinterprets the SOAP acronym for a neural network optimizer whose update is a function of the gradient’s whitening transformation in the curved loss landscape (Lu et al., 26 Sep 2025).
SOAP Note Generation (Clinical): Uses SOAP as an initialism (Subjective, Objective, Assessment, Plan) for structured automated documentation in healthcare (distinct in domain and not related to atomic position overlap) (Kamal et al., 7 Aug 2025).

2. SOAP for Structural Representation in Atomistic Systems

The canonical application of SOAP is in atomic- and molecular-level simulation, where it encodes atomic environments for use in interatomic potentials, clustering, similarity analysis, and machine learning. The SOAP vector for site $i$ is typically

$\mathbf{p}_i = \{\gamma_{nn'l}\}$

where the $\gamma$ are feature coefficients derived from the basis expansion.

SOAP-based similarity measures are constructed as normalized inner products (“kernels”): $K^{\text{SOAP}}(i, j) = \frac{\mathbf{p}_i \cdot \mathbf{p}_j}{|\mathbf{p}_i||\mathbf{p}_j|}$ with the associated “SOAP distance”

$d^{\text{SOAP}}(i, j) = \sqrt{2-2K^{\text{SOAP}}(i, j)}$

The resulting kernel functions have been used as input to Gaussian Process Regression, neural networks, and other ML methods for e.g. fitting accurate molecular force fields, accelerating crystal structure searches, and extracting structure–property relationships.

TimeSOAP (Caruso et al., 2023) introduces the time-differentiated SOAP distance,

$\lambda_i^{t+\Delta t} = \frac{\sqrt{2-2K^{\text{SOAP}}(i^t, i^{t+\Delta t})}}{\Delta t}$

enabling the segmentation of dynamic domains and detection of local atomic rearrangements, transitions, and rare events directly from molecular dynamics trajectories through unsupervised clustering of $\{\lambda_i^{(t)}\}$ and their time derivatives.

3. Extensions: SOAP Descriptors for Magnetic Structures

The SOAP approach has been extended to encode magnetic environments by constructing a continuous magnetization density from localized atomic moments: $\mathbf{m}(\mathbf{r}) = \sum_{j=1}^N \exp(-\alpha|\mathbf{r}-\mathbf{R}_j|^2)\mathbf{m}_j$ which is then expanded in a basis $\{\phi_n(r)Y_{lm}^L(\hat{\mathbf{r}})\}$ of radial functions and vector spherical harmonics. The expansion coefficients $c_{nL\ell m}$ are contracted into rotational invariants (second-order “power spectrum” and fourth-order “trispectrum”) to build descriptors that account for both structural and magnetic symmetries (Suzuki et al., 2023). Higher-order partial spectra are necessary to discriminate magnetic structures differing by anisotropy or symmetry class, as demonstrated on Mn $_3$ Ir and Mn $_3$ Sn, using similarity kernels $K^{(2)}$ and $K^{(4)}$ formed from the corresponding spectra.

4. Theoretical Foundations and Generalizations

SOAP’s design ensures invariance to key symmetry groups (rotational, translational, permutational) and offers continuity and differentiability, making it particularly suitable for ML tasks in physical sciences. The transformations underpinning SOAP have also prompted connections to covering space methods for minimal surfaces—where the topology of complex objects is encoded in coverings and mapped to robust geometric descriptors (Bellettini et al., 2017). In this context, one creates a “descriptor” for a minimal surface (such as a soap film with tunnel or triple line) via constraints imposed on lifted functions in the covering space, establishing a parallel to encoding topological complexity in the SOAP framework.

Beyond atomistic physical systems, “SOAP” is also an acronym in clinical documentation (Subjective, Objective, Assessment, Plan) rather than a geometric descriptor. In machine-generated medical notes, frameworks such as Skin-SOAP (Kamal et al., 7 Aug 2025) employ multimodal weakly supervised learning with retrieval-augmented LLMs to generate structured documentation, incorporating evaluation metrics such as MedConceptEval and Clinical Coherence Score (CCS).

5. SOAP as Gradient Whitening in Neural Network Optimization

Shampoo with Adam in Preconditioner Eigenbasis (SOAP) recasts the acronym in a neural network optimization context (Lu et al., 26 Sep 2025). Here, SOAP denotes an optimizer which rotates gradients into the eigenbasis of preconditioners (from Shampoo), applies Adam-style normalization, and rotates updates back. The theoretical contribution is an equivalence result: when the whitening matrix (covariance of the vectorized gradients) factors as a Kronecker product, SOAP’s updates are mathematically congruent to those of Shampoo. This is formalized as: $\Sigma = (E[G^T G] \otimes E[GG^T]) / \mathrm{Tr}(E[GG^T])$ with optimizer updates performed along whitening directions. Empirical results show nearly identical convergence rates and final loss values for SOAP and Shampoo on language modeling and image colorization tasks, with second-order methods converging faster than Adam in early iterations but exhibiting no systematic advantage in final performance.

6. Evaluation, Limitations, and Advanced Use Cases

SOAP descriptors are evaluated by their capacity to distinguish environments (e.g., phases, magnetic orders, or structural motifs) and to robustly predict target properties in regression/classification frameworks. High-dimensional SOAP vectors enable fine discrimination of atomic-scale environments but may incur computational overhead. While well suited for homogeneous chemical environments (e.g., elemental crystals, pure metals), discriminative performance and computational scalability can be challenged in the presence of large numbers of atomic species, complex long-range correlations, or highly dynamic situations.

Algorithmic trade-offs include:

The need for high expansion ranks (radial and angular) to capture subtle differences (especially in magnetic SOAP).
Preconditioner update frequency and stability in optimization (as in Shampoo and SOAP optimizers).
The handling of time-series versus static clustering in analysis applications (TimeSOAP), influencing the detection of kinetic versus thermodynamic domains.

SOAP and its variants have been instrumental in:

Delineating solid/liquid interfaces in molecular simulations by time-resolved clustering (Caruso et al., 2023).
High-resolution discrimination of magnetic phases in Mn $_3$ Ir, Mn $_3$ Sn, and model chains (Suzuki et al., 2023).
Structured generation of clinical notes, evaluated via domain-adapted metrics for both content and semantic fidelity (Kamal et al., 7 Aug 2025).
Comparative studies of second-order geometry-aware optimizers in large-scale ML (Lu et al., 26 Sep 2025).

7. Perspectives and Future Directions

Recent work has highlighted several frontiers for SOAP-based methodology:

Integration with explicit time-series classification for improved kinetic modeling of atomic rearrangements (Caruso et al., 2023).
Further development of higher-order invariants and parameter optimization in magnetic and multipolar SOAP kernels (Suzuki et al., 2023).
Extension to more diverse materials classes, complex disordered or solution systems, and large-scale biophysical assemblies.
In optimizers, refinement of whitening approximations and scaling to very large neural architectures (Lu et al., 26 Sep 2025).
In clinical documentation, expanding multimodal, weakly supervised SOAP note generation to broader medical specialties and user-in-the-loop correction (Kamal et al., 7 Aug 2025).

Editor's term: The “SOAP family” thus encompasses a suite of mathematically principled, symmetry-aware descriptors for environment fingerprinting, adapted and extended across disciplines from condensed matter and computational chemistry to deep learning and natural language processing. These methods are unified by a focus on robust, invariant, and information-rich representations that capture both local and global architectural features of complex systems.