Papers
Topics
Authors
Recent
2000 character limit reached

OXtal: Diffusion Model for CSP

Updated 10 December 2025
  • OXtal is a large-scale, all-atom diffusion model for molecular crystal structure prediction that models joint conditional distributions over intramolecular conformations and crystal packing.
  • It leverages a crystallization-mimetic S⁴ procedure and extensive data augmentation to implicitly learn periodic symmetries, enhancing packing accuracy and computational efficiency.
  • Empirical results show OXtal achieves up to perfect packing recovery (Pac_C = 1.000) with significant improvements over traditional DFT-based methods on both rigid and flexible molecules.

OXtal is a large-scale, all-atom diffusion model for molecular crystal structure prediction (CSP) that directly models the joint conditional distribution over both intramolecular conformations and periodic crystal packing, given an input 2D molecular graph. The model abandons explicit enforcement of crystal symmetry in network design in favor of extensive data augmentation, and introduces a crystallization-mimetic, lattice-free stoichiometric stochastic shell sampling (S⁴) procedure for scalable training. OXtal leverages a dataset of nearly 600,000 experimentally validated crystal structures and demonstrates substantial improvements in crystal packing recovery and computational efficiency compared to both ab initio ML and traditional DFT-based CSP methods (Jin et al., 7 Dec 2025).

1. Model Formulation and Architecture

OXtal is formulated as a continuous-time diffusion generative model targeting the conditional joint distribution over all-atom coordinates in molecular crystals. The crystal structure is defined as an equivalence class [C]=[(L,B)]M(g)[\mathcal{C}] = [(L, \mathcal{B})] \in \mathcal{M}(g) under translation, rotation, permutation, and supercell transformations, for an input 2D molecular graph gg. The model seeks to learn

p([C]g)κ([C]g)exp(βΔG([C]))p([\mathcal{C}] \mid g) \approx \kappa_\aleph([\mathcal{C}] \mid g) \exp(-\beta \Delta G([\mathcal{C}]))

where ΔG\Delta G is the Gibbs free energy and κ\kappa_\aleph captures kinetic accessibility. OXtal implements the variance-exploding SDE formulation:

  • Forward: dXt=ft(Xt)dt+σtdWtd\mathbf{X}_t = f_t(\mathbf{X}_t)dt + \sigma_t d\mathbf{W}_t, X0p0\mathbf{X}_0 \sim p_0
  • Reverse: dXt=[ft(Xt)σt2xlogpt(Xt)]dt+σtdWtd\mathbf{X}_t = [f_t(\mathbf{X}_t) - \sigma_t^2 \nabla_x \log p_t(\mathbf{X}_t)]dt + \sigma_t d\overline{\mathbf{W}}_t

The score function xlogpt\nabla_x \log p_t is estimated by a denoising network DθD_\theta, minimizing the loss: LDSM(θ)=Et,x0p0,xtpt(x0)[λ(t)x0Dθ(xt,t)2].\mathcal{L}_{\rm DSM}(\theta) = \mathbb{E}_{t, x_0 \sim p_0, x_t \sim p_t(\cdot|x_0)} [\lambda(t)\|x_0 - D_\theta(x_t, t)\|^2].

The OXtal network (≈100M parameters) consists of:

  • Atom encoder (∼2M params): Embeds atomic identities, charges, bonds, and a reference conformer.
  • Pairformer trunk (∼25M params): Alternating single and pairwise updates via triangular self-attention, based on the AlphaFold3 Evoformer but lacking explicit equivariance.
  • Diffusion module (∼70M params): 12-block Transformer ingesting atom/pair features and noisy coordinates, predicting denoised coordinates.
  • Atom-attention decoder: Final refinement stage.

Crucially, the model omits explicit lattice vector prediction and does not hard-wire crystalline symmetry, instead learning periodicity from data.

2. Symmetry Treatment and Data Augmentation

Crystal symmetries—translations, rotations, permutation of indices, and supercell transformations—are not encoded via equivariant networks but through extensive data augmentation. At each training instance, OXtal applies:

  • Rigid SO(3) rotations (Haar measure),
  • Uniform translations in [0,1)3[0, 1)^3 (toroidal box),
  • Random unimodular GL(3,Z)\mathrm{GL}(3, \mathbb{Z}) cell transformations,
  • Atom index permutations.

This ensures that the model does not overfit to any canonical orientation or indexing, promoting implicit invariance to the full crystal symmetry group. No privileged frame, origin, or atom ordering is observable by the model at training time (Jin et al., 7 Dec 2025).

3. Stoichiometric Stochastic Shell Sampling (S⁴)

OXtal introduces S⁴, a lattice-free, locality-conserving cropping scheme designed to expose the model to both local and longer-range periodicity without requiring full supercell parameterization. Inductive crystalline patterns are thus learned efficiently at the all-atom level.

  • Algorithm:
  1. Sample a central molecule mcm_c from the asymmetric unit.
  2. Build concentric molecular shells Sk(mc)\mathcal{S}_k(m_c) at radii [krcut,(k+1)rcut)[k r_{\rm cut}, (k+1) r_{\rm cut}).
  3. Randomly select the shell depth KK.
  4. Stochastically crop a union of shells VKV_K subject to a token (atom) budget TmaxT_{\max}.
  5. If VK>Tmax|V_K| > T_{\max}, subsample from the outermost shell, preserving stoichiometry.
  • Theoretical bound: The imposed boundary-to-volume error for the local loss scales as O(Tcrop1/3)O(T_{\rm crop}^{-1/3}).

This approach aligns with physical crystallization steps—nucleation and interfacial growth—and enables OXtal to learn the energetics and kinetics across a wide spectrum of motif sizes (Jin et al., 7 Dec 2025).

4. Dataset Construction and Training Protocol

The model is trained on 594,202 structures from the Cambridge Structural Database (CSD, release ≤ May 2025), filtered for experimental completeness, absence of polymers, known space group, and RDKit sanitizability. Deduplication based on RMSD15<0.25_{15} < 0.25 Å prunes near-identical polymorphs. The dataset covers rigid and flexible molecules, co-crystals, solvates, and salts.

  • Batching: Each batch is an S4S^4 crop (max 640 atoms/tokens). Sampling mixes S⁴ and standard kkNN cropping.
  • Optimization: Adam (β1=0.9\beta_1=0.9, β2=0.95\beta_2=0.95), initial LR 1.8×1031.8 \times 10^{-3}, linear warmup (1,000 steps), and exponential decay.
  • Loss function: Composite of MSE on rigid-aligned coordinates, a local distance-test score (sLDDT), and a distance-based loss on predicted vs. true interatomic distances:

L(θ)=Et,x0,xt[x^0x0align2+LsLDDT(x^0,x0align)]+λdistLdist(d^,d)\mathcal{L}(\theta) = \mathbb{E}_{t,x_0,x_t}[\|\hat{x}_0 - x_0^{\rm align}\|^2 + \mathcal{L}_{\rm sLDDT}(\hat{x}_0, x_0^{\rm align})] + \lambda_{\rm dist} \mathcal{L}_{\rm dist}(\hat{d}, d)

OXtal was trained for approximately 110,000 steps on large-scale GPU clusters (L40S/H100) (Jin et al., 7 Dec 2025).

5. Metrics and Empirical Results

Model quality is assessed via several geometric and packing-specific metrics:

Metric Description
ColS\mathrm{Col}_S (Collision) Fraction of samples with intermolecular clashes (distance << sum of vdW radii 0.7-0.7 Å). Lower is better.
PacS_S, PacC_C (Packing) Fraction of samples/crystals with packing-similar clusters (COMPACK, RMSD15<2_{15}<2 Å)
RecS_S, RecC_C (Recovery) Fraction of samples/crystals with conformer recovery (RMSD1<0.5_1<0.5 Å, non-H atoms)
Sol~C\widetilde{\mathrm{Sol}}_C Approx. solved: collision-free, packing-similar, and RMSD15<2_{15}<2 Å

Empirical comparisons on 50 rigid and 50 flexible test cases:

Model ColS_S PacS_S PacC_C RecS_S RecC_C SolvedC_C
A-Transformer 0.731 0.015 0.060 0.033 0.120 0.060
AssembleFlow 0.524 0.001 0.040 0.001 0.020 0.000
OXtal 0.011 0.873 1.000 0.737 0.960 0.300

On flexible molecules, OXtal achieves PacC=0.90_C=0.90 and SolvedC=0.22_C=0.22, while the baselines fail to recover any packing-similar crystals (PacC=0_C=0).

In CCDC blind CSP benchmarks, OXtal (30 samples per target) attains PacC=0.833_C=0.833 in blind test 5, outperforming the aggregated DFT group average (PacC=0.661_C=0.661) which required ∼464 samples per target. Inference cost per crystal is ∼$0.24 on commodity hardware, over an order of magnitude lower than DFT-based workflows (Jin et al., 7 Dec 2025).

6. Assessment, Limitations, and Prospects

OXtal demonstrates that a high-capacity, all-atom diffusion model trained on broad experimental crystal data can capture both thermodynamic (Boltzmann-like) and kinetic (crystallization-accessibility) regularities in molecular crystallization, without explicit symmetry constraints or cell parameterization. The S⁴ scheme enables scalable exposure to periodic interactions, and the model’s lack of equivariant or explicit lattice bias is offset by data augmentation and Transformer depth.

Ablation studies reveal that omitting S⁴ for simpler cropping degrades both conformer recovery and packing accuracy; reductions to 50M parameters still yield performance superior to prior ML CSP. Limitations include the absence of explicit energy ranking, local DFT refinement, or conditioning on solvent/temperature. Further advances may involve integrating energy-based re-ranking or direct prediction of cell vectors and volumes.

OXtal establishes the feasibility of highly accurate, low-cost ab initio ML CSP at the all-atom level, with a design that is well-suited for further scalability and integration into larger molecular design and screening pipelines (Jin et al., 7 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to OXtal.