Atomistic Generative Diffusion (AGeDi)

Updated 26 July 2025

AGeDi is a class of deep generative models that reverses noise processes to create both atomic positions and types in a physically coherent manner.
It employs techniques like equivariant score networks, interleaved discrete-continuous diffusion, and classifier-free guidance to ensure chemically consistent structure generation.
The approach enables targeted inverse design in molecules, clusters, 2D materials, and solids, validated by precision–recall metrics against synthetic baselines.

Atomistic Generative Diffusion (AGeDi) refers to a class of deep generative models that leverage score-based diffusion processes to generate atomistic structures—typically atomic positions and atomic types—in a manner that is both physically principled and capable of capturing complex chemical and structural correlations. These models have been applied to a wide spectrum of atomistic systems, including molecules, clusters, two-dimensional (2D) materials, and solids, and are emerging as a flexible paradigm for inverse design and materials discovery (Rønne et al., 24 Jul 2025).

1. Core Principles and Model Formulation

AGeDi models integrate two diffusion processes: one continuous and one discrete, each tailored to the physical nature of atomistic data.

Continuous Score-Based Diffusion for Atomic Positions: Atomic coordinates are generated by reversing a stochastic differential equation (SDE) that incrementally adds noise to atomic positions during a forward “noising” process. The forward SDE for the variance-preserving (VP) case can be written as

$d\mathbf{R}_t = -\frac{1}{2} \beta(t) \mathbf{R}_t dt + \sqrt{\beta(t)} d\mathbf{W}_t,$

where $\mathbf{R}_t$ denotes atomic positions, $\beta(t)$ is the noise schedule, and $d\mathbf{W}_t$ is Brownian noise. The reverse SDE utilizes the learned score function $s_{\theta}$ to guide atomic positions back from the noise prior to the data manifold.

Continuous-Time Discrete Diffusion for Atomic Types: Atomic types (elemental identities) are modeled as discrete-multiclass variables. The forward process is defined as a continuous-time Markov process,

$\frac{dp_t}{dt} = Q_t p_t,$

where $Q_t$ is the transition rate matrix, typically driving all atoms towards a masked type as $t \to 1$ . The reverse process relies on a learned “concrete score” (the analog of a gradient for discrete variables) to reconstruct the physical atomic types (Rønne et al., 24 Jul 2025).

Joint Generation: AGeDi couples these two processes, allowing coherent sampling of both atomic coordinates and identities in a manner that captures their intrinsic correlations. During sampling, denoising steps for positions and types are interleaved, ensuring the resulting structure is both geometrically and chemically consistent.

2. Methodological Innovations

Several innovations distinguish state-of-the-art AGeDi frameworks:

Equivariant Score Networks: The score function for atomic positions is parameterized by a rotationally and translationally equivariant graph neural network (GNN). Such architectures, often in the PaiNN or related family, guarantee that predictions respect the symmetries of 3D space.
Interleaved Discrete-Continuous Diffusion: By treating atomic types and positions as two coupled modalities—discrete and continuous, respectively—AGeDi can generate diverse elementally heterogeneous systems, such as multimetallic clusters, as well as structures with target stoichiometries (Rønne et al., 24 Jul 2025).
Atomic Type Interpolation: The discrete diffusion framework allows continuous interpolation between element embeddings. For example, to generate bimetallic clusters from purely monometallic training data, one can interpolate atomic embeddings as

$E_{AB} = \alpha E_A + (1 - \alpha) E_B, \quad \alpha \sim \mathrm{Uniform}(0, 1).$

During probabilistic resolution, the final assignment leverages structure–type correlations learned during training.

Classifier-Free Guidance for Conditional Generation: For conditional tasks (e.g., generating 2D materials with a specified symmetry), classifier-free guidance (CFG) is used to bias the score model towards samples that satisfy desired properties or constraints. The score at each denoising step is interpolated between conditional and unconditional predictions:

$\tilde{s}_\theta(M_t, t, y) = w\, s_\theta(M_t, t, y) + (1 - w)\, s_\theta(M_t, t, \varnothing),$

where $M_t$ is the current state, $y$ is the conditioning variable, and $w$ is a tunable guidance parameter.

3. Applications of AGeDi

AGeDi frameworks have demonstrated versatility across chemical and materials domains:

Metallic Clusters: Training on the Quantum Cluster Database (QCD), AGeDi can generate low-energy mono- or bimetallic nanoclusters using only mono-metallic training data. The precision-recall (PR) framework relative to synthetic baselines validates that generated clusters match the fidelity and diversity of the reference ensemble.
2D Materials: Using the C2DB dataset, AGeDi generates novel two-dimensional crystals by conditioning on symmetry labels (e.g., layer group number). Classifier-free guidance steers the reverse diffusion process, enabling generation of structures with targeted crystallographic symmetry.
Fidelity and Diversity Evaluation: Metrics such as precision (the proportion of generated samples close to reference structures) and recall (the coverage of reference structures by generated samples) are quantified by comparison to synthetically perturbed datasets with known noise levels. This enables tuning of generation quality and diversity in a principled manner (Rønne et al., 24 Jul 2025).

4. Performance Metrics and Benchmarking

Performance of AGeDi models is evaluated using quantitative metrics that disentangle sample quality from diversity and compare against established physical and data-driven baselines:

Metric	Definition	Significance
Precision	Fraction of generated samples near the training distribution	Measures fidelity (quality)
Recall	Fraction of training data covered by generated samples	Measures diversity
PR curves	Plots precision versus recall across threshold variations	Comprehensive fidelity–diversity map
Synthetic Baselines	Baselines with controlled Gaussian perturbations or subsampling	Provides interpretable anchors

Interpreting PR scores against these baselines allows for objective assessment of whether a generative model captures both the fine-scale details and the overall breadth of the training set.

5. Software and Implementation

AGeDi is implemented as an open-source, extensible software package (see https://github.com/nronne/agedi) (Rønne et al., 24 Jul 2025). The software provides:

AtomsGraph class for dynamic graph construction and structure representation.
ScoreModel class for equivariant GNN-based prediction of continuous and discrete score functions.
Noiser class with utilities for forward and reverse diffusion, as well as specialized loss functions.
Diffusion class integrating training and sampling workflows.
Interoperability with external simulation environments (e.g., ASE) for downstream evaluation or integration.
Pretrained checkpoints and documentation for standard datasets (QCD, C2DB), facilitating further extension or benchmarking.

6. Methodological Context and Relation to Other Approaches

AGeDi shares conceptual similarities with other advanced atomistic generative models:

Non-Autoregressive Joint Generation: Unlike autoregressive models that build structures atom-by-atom, AGeDi generates all atoms simultaneously, capturing global dependencies (Lin et al., 2022).
Unified Treatment of Continuous and Discrete Variables: Models such as MUDiff and EQGAT-diff also employ hybrid continuous/discrete diffusion processes for molecule generation (Hua et al., 2023, Le et al., 2023).
Latent and Conditional Generation: Latent space and symmetrically constrained representations, as in All-atom Diffusion Transformers (ADiT) and space-group-aware methods, enable scale-bridging and explicit incorporation of crystallographic constraints (Joshi et al., 5 Mar 2025, Chang et al., 16 May 2025).
Evaluation Protocols: The PR framework and information-theoretic metrics mirror the use of the Fréchet Inception Distance and analogous embedding-based assessments in other generative domains.

7. Impact and Outlook

AGeDi frameworks offer fundamental advances for atomistic inverse design:

Generalization Ability: Interpolation of atomic type embeddings enables controlled extrapolation to previously unseen compositions, a critical advantage for generative discovery of multi-component clusters and materials.
Conditional and Targeted Generation: Classifier-free guidance supports symmetry-aware and property-targeted structure generation, relevant for materials with required electronic, structural, or catalytic properties.
Integration into Materials Discovery Pipelines: Open-source, modular implementations facilitate adaptation, benchmarking, and extension to new domains.
Benchmarking Rigor: Precision–recall evaluations, synthetic baselines, and chemically validated metrics provide robust tools for performance analysis and model comparison.

The AGeDi approach—combining physically grounded score-based diffusion with flexible software implementation—constitutes a pivotal step in the broader adoption and advancement of generative models for atomistic structure prediction, materials design, and computational chemistry (Rønne et al., 24 Jul 2025).