Papers
Topics
Authors
Recent
Search
2000 character limit reached

Crystalite: Efficient Diffusion Transformer for Crystals

Updated 3 April 2026
  • Crystalite is a lightweight diffusion Transformer model designed for efficient crystal modeling by embedding chemical and geometric information directly into its architecture.
  • It employs Subatomic Tokenization and the Geometry Enhancement Module (GEM) to incorporate chemical descriptors and periodic geometric bias into transformer attention, enhancing CSP performance.
  • Empirical results show that Crystalite outperforms geometry-heavy baselines in terms of match rate, RMSE, and generation speed, achieving state-of-the-art performance on multiple benchmarks.

Crystalite is a lightweight diffusion Transformer model for generative modeling of crystalline materials, architected to balance efficiency and crystalline inductive bias without the computational burden of full equivariant graph neural networks. It introduces two principal mechanisms—Subatomic Tokenization and the Geometry Enhancement Module (GEM)—to incorporate chemical and geometric information directly into transformer attention, enabling high fidelity crystal structure prediction (CSP) and de novo crystal generation with sampling efficiency superior to existing geometry-heavy baselines (Veljković et al., 2 Apr 2026).

1. Model Formulation and Architecture

Crystalite operates on three continuous input channels per crystal: chemically structured atom tokens HRN×dH\mathbf{H} \in \mathbb{R}^{N \times d_H}, fractional atom coordinates F[0,1)N×3\mathbf{F} \in [0,1)^{N \times 3}, and a six-dimensional lattice latent yR6\mathbf{y} \in \mathbb{R}^6 parameterizing the unit cell. The input sequence consists of NN atom tokens and a single “lattice” token, each token embedding both chemical identity and spatial information in a common hidden space of dimension dd.

A noise-level conditioning vector cσRd\mathbf{c}_\sigma \in \mathbb{R}^d is produced via a small MLP and applied to all transformer blocks via adaptive layer normalization. Internally, the architecture stacks KK standard Transformer blocks with multi-head self-attention, each enhanced by the Geometry Enhancement Module. Output heads produce estimates for atomic identity, positions, and lattice parameters. The model is trained in the continuous EDM (score-matching diffusion) framework, perturbing all channels jointly with isotropic Gaussian noise parameterized by σ\sigma, and optimizing a weighted combination of mean squared errors across channels: L=λHLH+λFLF+λlatLlat\mathcal{L} = \lambda_H\,\mathcal{L}_H + \lambda_F\,\mathcal{L}_F + \lambda_{\mathrm{lat}}\,\mathcal{L}_{\mathrm{lat}} with weights (λH,λF,λlat)=(1,50,5)(\lambda_H, \lambda_F, \lambda_{\mathrm{lat}}) = (1, 50, 5). Sampling employs the EDM Heun sampler with 150 steps, optionally accelerated by channel anti-annealing to expedite convergence on slower-denoising channels.

2. Subatomic Tokenization

Crystalite replaces one-hot Z-based atom type encoding with Subatomic Tokenization, representing each element F[0,1)N×3\mathbf{F} \in [0,1)^{N \times 3}0 by a compact, chemically structured descriptor F[0,1)N×3\mathbf{F} \in [0,1)^{N \times 3}1. This encoding concatenates:

  • one-hot period (7D)
  • one-hot group (19D)
  • one-hot block (4D)
  • normalized ground state valence occupancies F[0,1)N×3\mathbf{F} \in [0,1)^{N \times 3}2

Features are standardized and weighted, then optionally projected via PCA to dimension F[0,1)N×3\mathbf{F} \in [0,1)^{N \times 3}3 (typically F[0,1)N×3\mathbf{F} \in [0,1)^{N \times 3}4). For crystals with atoms F[0,1)N×3\mathbf{F} \in [0,1)^{N \times 3}5,

F[0,1)N×3\mathbf{F} \in [0,1)^{N \times 3}6

Decoding the atom type from a latent token is performed by cosine similarity matching against the dictionary. This continuous, chemistry-aware embedding both reduces model size (from up to 89 one-hot to 16 dimensions) and encodes chemical similarity directly into the atom latent space, facilitating diffusion-based generative modeling and mitigating memorization artifacts for frequent compositions (Veljković et al., 2 Apr 2026).

3. Geometry Enhancement Module (GEM)

GEM introduces periodic crystal geometry into the multi-head self-attention by computing a head-wise geometric bias matrix F[0,1)N×3\mathbf{F} \in [0,1)^{N \times 3}7 for every atom pair F[0,1)N×3\mathbf{F} \in [0,1)^{N \times 3}8. The module computes minimum-image displacements F[0,1)N×3\mathbf{F} \in [0,1)^{N \times 3}9 on the periodic torus using the metric tensor yR6\mathbf{y} \in \mathbb{R}^60, where yR6\mathbf{y} \in \mathbb{R}^61 reconstructs the lattice from yR6\mathbf{y} \in \mathbb{R}^62. The normalized Cartesian distance yR6\mathbf{y} \in \mathbb{R}^63 and edge features are inputs to two bias terms per attention head:

  • yR6\mathbf{y} \in \mathbb{R}^64, yR6\mathbf{y} \in \mathbb{R}^65
  • yR6\mathbf{y} \in \mathbb{R}^66

Both branches are scaled by a noise-dependent gate yR6\mathbf{y} \in \mathbb{R}^67 and summed: yR6\mathbf{y} \in \mathbb{R}^68 This additive bias is included in the attention logits: yR6\mathbf{y} \in \mathbb{R}^69 Thus, the attention mechanism is directly responsive to periodic atom-pair geometry while maintaining standard Transformer operations and NN0 computational complexity. GEM efficiently encodes key periodic and geometric relations fundamental to crystalline materials (Veljković et al., 2 Apr 2026).

4. Training, Sampling Efficiency, and Baseline Comparison

Crystalite comprises NN1 parameters in a 14-layer Transformer with 16 heads (hidden width NN2), trained in bfloat16 for NN3 million steps (batch 128, AdamW, learning rate NN4, EMA NN5). During sampling, 1 000 crystals require NN6 s (or NN7 s with FlashAttention and bfloat16) on H100 GPU hardware, greatly exceeding the efficiency of previous geometry-heavy baselines:

  • MatterGen: NN8 s/1k
  • FlowMM: NN9 s/1k
  • DiffCSP: dd0 s/1k
  • CrystalDiT: dd1 s/1k
  • Crystalite: dd2 s/1k (dd3 s optimized)

This increased efficiency is attributed to the architectural simplicity, Subatomic Tokenization, and lightweight geometry bias of GEM, avoiding high-cost tensor equivariance and per-step message passing (Veljković et al., 2 Apr 2026).

5. Empirical Results on Crystal Structure Prediction and Generation

On three CSP benchmarks—MP-20, MPTS-52, and Alex-MP-20—Crystalite achieves state-of-the-art results in both match rate (MR) and geometric RMSE. For example:

Benchmark Best Prior MR / RMSE Crystalite MR / RMSE (Å)
MP-20 KLDM: 65.8% / 0.0517 66.1% / 0.0329
MPTS-52 KLDM: 23.9% / 0.1276 31.5% / 0.0701
Alex-MP-20 OMatG: 64.7% / 0.1251 67.5% / 0.0335

In de novo generation of 10 000 crystals, Crystalite attains the highest S.U.N. discovery score (48.6%), best density Wasserstein distance (0.046), and fastest generation rate (22.4 s, 5.1 s optimized per 1 k samples), outperforming all evaluated alternatives (Veljković et al., 2 Apr 2026).

6. Ablation Analyses and Inductive Bias Effects

Ablation studies indicate that GEM accelerates stability learning (raising mid-training stability by ~10% and S.U.N. by ~5%) and reduces geometric RMSE by ~20% in CSP, with minimal impact on match rate. Subatomic Tokenization is argued to be essential for avoiding memorization of high-frequency compositions and explicitly embedding chemical similarities but is not quantitated directly in ablation tables. This suggests that both inductive biases contribute non-trivially to sample efficiency and output quality, especially in distributional robustness and geometric fidelity (Veljković et al., 2 Apr 2026).

7. Limitations and Prospects

Limitations of the current Crystalite formulation include reliance on a fixed Niggli cell convention (no basis augmentation), single-sample CSP setting, and external machine-learned interatomic potential (MLIP) for stability evaluation. Potential directions for extension are multi-sample Bayesian CSP, integrated cell-basis equivariance or Wyckoff symmetry conditioning, joint end-to-end training with stability predictors, and scaling to large systems via sparse attention or memory-optimized GEM computation.

This summary is based on information and results reported in "Crystalite: A Lightweight Transformer for Efficient Crystal Modeling" (Veljković et al., 2 Apr 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Crystalite.