Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
116 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
24 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
35 tokens/sec
2000 character limit reached

Score-Based Diffusion for Atomic Positions

Updated 26 July 2025
  • Score-based diffusion for atomic positions is a generative modeling method that uses the gradient of log-probability densities to sample and denoise high-dimensional atomic configurations.
  • It leverages stochastic differential equations and symmetry-aware neural networks to accurately model molecular, crystalline, and condensed matter systems.
  • Practical applications include accelerating molecular dynamics, enhancing structure denoising, and enabling generative materials discovery with improved evaluation metrics.

Score-based diffusion for atomic positions is a methodology in generative modeling that constructs high-dimensional distributions over atomic configurations using the gradient (score) of the log-probability density. By leveraging stochastic differential equations (SDEs) or Markov chains, these methods enable the sampling, denoising, or evolution of atomic positions in molecules, crystals, condensed matter, and other atomically resolved systems. Key advances integrate physical symmetries (e.g., SE(3), space group equivariance), geometric invariants, and joint modeling of atomic positions, types, and lattice structure. Recent score-based diffusion models have shown compelling results in acceleration of molecular dynamics, denoising atomic structures, generative materials discovery, and scalable multiscale modeling.

1. Mathematical Framework of Score-Based Diffusion for Atomic Positions

Score-based diffusion models define a parameterized stochastic process—most commonly via an SDE of the form: dx(t)=f(x(t),t)dt+g(t)dwtd\mathbf{x}(t) = f(\mathbf{x}(t), t) dt + g(t) d\mathbf{w}_t where x\mathbf{x} is the vector of atomic positions and dwtd\mathbf{w}_t denotes standard Brownian increments. The forward (noising) process maps ground-truth atomic configurations x0\mathbf{x}_0 to increasingly “noisy” configurations xT\mathbf{x}_T as tTt\to T. The core target is the score function: s(x,t)=xlogpt(x)\mathbf{s}(\mathbf{x}, t) = \nabla_{\mathbf{x}} \log p_t(\mathbf{x}) where pt(x)p_t(\mathbf{x}) is the time-dependent marginal distribution over (possibly noisy) atomic positions. The reverse (denoising or generative) process samples new x0\mathbf{x}_0 by integrating the time-reversed SDE, where the drift is augmented by s(x,t)-\mathbf{s}(\mathbf{x}, t) scaled by g2(t)g^2(t).

In molecular and condensed matter systems, atoms are often subject to further symmetries or constraints—the domain may be a Euclidean space, a torus (periodic fractional coordinates), a space group Wyckoff manifold, or a product of Euclidean and Lie group factors. To realize physical plausibility, s(x,t)\mathbf{s}(\mathbf{x}, t) must be invariant or equivariant under these group actions.

The score function is parameterized—typically as an SE(3)-equivariant GNN, transformer, or specialized neural field—and trained using denoising score matching (DSM) loss: LDSM(θ)=Et,x0,xt[λ(t)sθ(xt,t)xtlogp0t(xtx0)22]\mathcal{L}_\text{DSM}(\theta) = \mathbb{E}_{t, \mathbf{x}_0, \mathbf{x}_t}[\lambda(t) \|s_\theta(\mathbf{x}_t, t) - \nabla_{\mathbf{x}_t} \log p_0^t(\mathbf{x}_t|\mathbf{x}_0)\|_2^2] where λ(t)\lambda(t) weights time steps, p0tp_0^t is the forward process kernel, and sθs_\theta is the model’s score prediction.

For systems distinguished by coordinate manifolds (e.g. hypertori for crystals), noise injection and score matching are performed using wrapped normal or group-covariant kernels, and auxiliary variables (e.g. velocities) may be introduced to “trivialize” the geometry onto a flat latent space (Cornet et al., 4 Jul 2025).

2. Physical Symmetry and Equivariance in Atomic Position Diffusion

Physical distributions over atomic positions often respect spatial, chemical, or crystallographic symmetries:

  • Spatial (SE(3)) equivariance: Model outputs must transform consistently under rotation and translation of the atomic system (Wu et al., 2022, Hsu et al., 2023).
  • Periodic translation (fractional/nodal torus): For crystals, positions in fractional coordinates are [0,1)3[0,1)^3 modulo one, requiring periodic kernels and wrapped normal noise (Jiao et al., 2023, Cornet et al., 4 Jul 2025).
  • Space group equivariance: For crystals, diffusion on Wyckoff positions within symmetry-labeled asymmetric units must be invariant under discrete isometries. The space group wrapped normal (SGWN) kernel is therefore

p(xtx0)gGexp(xtgx02/2σt2)p(x_t\,|\,x_0) \propto \sum_{g \in G} \exp\left(-\|x_t - g\cdot x_0\|^2/2\sigma_t^2\right)

with learned score fields in the tangent spaces of the Wyckoff manifolds (Chang et al., 16 May 2025).

  • Graph-based connectivity: For molecular or protein systems, node (atom) features, edge features, and message passing must encode bond structure, local environments, and auxiliary CG variables (Liu et al., 2023).

By directly building E(3)-, torus-, and space group–equivariant architectures and learning on fractional and geometric representations, modern approaches achieve invariance and constraint satisfaction critical for chemical and physical plausibility (Jiao et al., 2023, Klipfel et al., 2023, Cornet et al., 4 Jul 2025, Chang et al., 16 May 2025).

3. Specialized Noise Schedules and Conditional Processes

Forward diffusion processes for atomic positions are tailored to the observed data modality and physical prior. Notable strategies include:

  • Conditional noise schedules: DiffMD modulates the forward process noise by instantaneous atomic accelerations, σa(t,s)\sigma_a^{(t,s)}, to reflect the kinetic proximity of MD frames, yielding more physically realistic sampling and enhancing simulation efficiency (Wu et al., 2022).
  • Wrapped normal and SDE generalization: For periodic domains, wrapped normal noise is used on the torus, ensuring the score-based process samples valid periodic atomic arrangements (Jiao et al., 2023, Cornet et al., 4 Jul 2025).
  • Shifting distributions for molecular conformation: SDDiff introduces a closed-form shifting score that transitions the inter-atomic distance/noise distribution from Gaussian to Maxwell-Boltzmann as the perturbation increases, better modeling the physics of molecular disassembly (Zhou et al., 2023).
  • Voxel-based, “grand canonical” representations: To address local minima in particle-based score-driven simulated annealing, voxel grid encodings with “smearing” of atomic positions permit creation/annihilation of atoms and more robust navigation of the configurational landscape for crystals and grain boundaries (Lei et al., 28 Aug 2024).

During training, loss weighting (λ(t)\lambda(t)) and pairing with curated noise levels are fine-tuned for domain-specific convergence and fidelity (Mal et al., 12 May 2025, Rønne et al., 24 Jul 2025).

4. Integrated Architectural and Algorithmic Innovations

Several architectural advances underpin the effectiveness of score-based diffusion for atomic positions:

  • Equivariant Geometric Transformers and GNNs: To exploit rotational, translational, and periodic invariances, EGTs or E(3)-GNNs incorporate atom-to-atom distance, directionality, and motion via invariant bases (such as 3D spherical Fourier-Bessel expansions) in their self-attention or message-passing (Wu et al., 2022). These networks serve as physically consistent score estimators.
  • Unified, matrix-based, or autoregressive frameworks: DiffCrysGen organizes all discrete structural data as invertible real-space crystallographic representations (IRCR), processed end-to-end in a single model to link atom types, positions, and lattice parameters (Mal et al., 12 May 2025). SGEquiDiff samples tensors of attributes—lattice, Wyckoff positions, types, and atomic coordinates—in a conditional autoregressive sequence before launching a group-equivariant diffusion trajectory (Chang et al., 16 May 2025).
  • Constraint and manifold-conditioned denoising: In generalized backmapping, BackDiff imposes auxiliary constraints via explicit gradients or projected denoising steps so that the generated all-atom structure systematically aligns with required coarse-grained variables or chemical features (Liu et al., 2023).
  • Score Fokker-Planck regularization: FP-Diffusion augments DSM with explicit regularization of the score’s time evolution so that the estimated field satisfies the dynamic consistency imposed by the Fokker–Planck equation, a critical condition for physical force fields and conservativity in atomic position generation (Lai et al., 2022).
  • Joint discrete–continuous diffusion: Atomistic Generative Diffusion (AGeDi) couples continuous-time Markov diffusion on atomic types (for discrete chemical species) with score-based diffusion on positions, supporting interpolation and classifier-free guidance for desired symmetries (Rønne et al., 24 Jul 2025).

5. Applications: Dynamics, Structure Generation, and Denoising

Table: Domains and tasks addressed

Domain Atomic Representation Primary Task
Molecules 3D Cartesian MD trajectory prediction, conformer sampling
Crystals Fractional/torus, Wyckoff Crystal generation, CSP, property-optimized design
Condensed Mat Cartesian, fractional Thermal noise denoising, defect identification
Proteins Atomistic CG/all-atom Backmapping CG models to full atomic ensembles
Materials Voxel, point cloud Grain boundary structure, phase exploration
  • Molecular dynamics acceleration: Generative models (DiffMD, Score Dynamics) bypass repeated force calculations and time integration, yielding direct predictions of atomic positions at large effective timesteps and spanning long-timescale conformational space (Wu et al., 2022, Hsu et al., 2023).
  • Crystal and material structure generation: Score-based models produce physically plausible, high-symmetry, and chemically valid crystals, with applications in rare-earth-free magnet design, CSP, de novo structure enumeration, and inverse property-driven search (Mal et al., 12 May 2025, Chang et al., 16 May 2025, Klipfel et al., 2023).
  • Atomic denoising and structure identification: Iterative denoising via learned score fields recovers thermalized atomic datasets to their noise-free crystalline motifs, markedly enhancing downstream classification and defect analysis (Hsu et al., 2022).
  • Backmapping of coarse-grained models: Conditional score-based diffusion, manifold constraints, and self-supervised learning admit architecture-agnostic reconstructions of protein all-atom details from varied CG representations (Liu et al., 2023).
  • Atomic type interpolation and compositional flexibility: By allowing interpolation in atom type embedding space, models can synthesize chemically novel or out-of-sample compositions (e.g., bimetallic clusters) despite training only on single-element systems (Rønne et al., 24 Jul 2025).

6. Performance, Generalization, and Evaluation Metrics

Empirical evaluation demonstrates state-of-the-art or improved performance in multiple regimes:

  • Root-mean-square error (RMSE) and Match Rate (MR): Used for structure. DiffMD achieves lowest ARMSE on MD17 (Wu et al., 2022); DiffCSP approaches ≈98.6% MR on Perov-5, with low RMSE (Jiao et al., 2023).
  • Energy accuracy: DP-CDVAE reports 68.1 meV/atom lower DFT-relaxed energy difference than prior CDVAE (Pakornchote et al., 2023); DiffCrysGen yields ≈65% success in generating rare-earth-free magnets meeting property and stability criteria versus ≈6% for baseline (Mal et al., 12 May 2025).
  • Symmetry and diversity: DiffCrysGen and SGEquiDiff produce balanced high-symmetry space group distributions; SGEquiDiff attains improved uniqueness and stability of crystal structures (Chang et al., 16 May 2025).
  • Denoising/classification accuracy: The score-based denoiser drastically improves disorder classification in noisy solids, achieving near-perfect accuracy post-denoising on DC3 (Hsu et al., 2022).
  • Holistic property metrics: Use of Frechet ALIGNN Distance (FAD) (Klipfel et al., 2023), classifier-free guidance for crystallographic symmetry (Rønne et al., 24 Jul 2025), and property-aligned screening (energy, magnetization) (Mal et al., 12 May 2025).

Generalization is demonstrated by transfer to new chemical compositions, handling physically distinct atomic environments, and robust denoising or backmapping across diverse data types (Hsu et al., 2022, Liu et al., 2023, Cornet et al., 4 Jul 2025).

7. Outlook and Ongoing Research Directions

Score-based diffusion for atomic positions continues to evolve:

  • Mathematical generalizations: Malliavin calculus provides explicit, theoretically grounded targets for the score function in both linear and nonlinear SDEs, suggesting enhanced rigor and tractability for model training and extension to generalized noise models (Mirafzali et al., 21 Mar 2025).
  • Flexible representations: Voxel/field-based grand canonical diffusion enables atom number fluctuations and overcomes point cloud limitations in global structure optimization and defect-rich configurations (Lei et al., 28 Aug 2024).
  • Manifold and group-valued diffusion: KLDM demonstrates performance and efficiency gains by leveraging the group structure (SO(2), hypertorus) of fractional coordinates and auxiliary velocities, with broader implications for generating manifold-valued scientific data (Cornet et al., 4 Jul 2025).
  • Regularization and physicality: Enforcing score dynamics, Fokker–Planck self-consistency, and symmetry-constrained flows remain active areas to guarantee physically meaningful, conservative vector fields and energy-respecting structure evolution (Lai et al., 2022, Chang et al., 16 May 2025).
  • Software and extensibility: Modular open-source frameworks (AGeDi) lower the barrier to deployment and experimentation, supporting extensible pipelines for a wide range of atomistic generative tasks (Rønne et al., 24 Jul 2025).

Score-based diffusion models for atomic positions thus represent a unifying paradigm for sampling, evolution, and design across molecular and materials science, with ongoing theoretical advances and empirical results demonstrating their versatility and transformative impact across atomistic domains.