Crystalite: Efficient Diffusion Transformer for Crystals
- Crystalite is a lightweight diffusion Transformer model designed for efficient crystal modeling by embedding chemical and geometric information directly into its architecture.
- It employs Subatomic Tokenization and the Geometry Enhancement Module (GEM) to incorporate chemical descriptors and periodic geometric bias into transformer attention, enhancing CSP performance.
- Empirical results show that Crystalite outperforms geometry-heavy baselines in terms of match rate, RMSE, and generation speed, achieving state-of-the-art performance on multiple benchmarks.
Crystalite is a lightweight diffusion Transformer model for generative modeling of crystalline materials, architected to balance efficiency and crystalline inductive bias without the computational burden of full equivariant graph neural networks. It introduces two principal mechanisms—Subatomic Tokenization and the Geometry Enhancement Module (GEM)—to incorporate chemical and geometric information directly into transformer attention, enabling high fidelity crystal structure prediction (CSP) and de novo crystal generation with sampling efficiency superior to existing geometry-heavy baselines (Veljković et al., 2 Apr 2026).
1. Model Formulation and Architecture
Crystalite operates on three continuous input channels per crystal: chemically structured atom tokens , fractional atom coordinates , and a six-dimensional lattice latent parameterizing the unit cell. The input sequence consists of atom tokens and a single “lattice” token, each token embedding both chemical identity and spatial information in a common hidden space of dimension .
A noise-level conditioning vector is produced via a small MLP and applied to all transformer blocks via adaptive layer normalization. Internally, the architecture stacks standard Transformer blocks with multi-head self-attention, each enhanced by the Geometry Enhancement Module. Output heads produce estimates for atomic identity, positions, and lattice parameters. The model is trained in the continuous EDM (score-matching diffusion) framework, perturbing all channels jointly with isotropic Gaussian noise parameterized by , and optimizing a weighted combination of mean squared errors across channels: with weights . Sampling employs the EDM Heun sampler with 150 steps, optionally accelerated by channel anti-annealing to expedite convergence on slower-denoising channels.
2. Subatomic Tokenization
Crystalite replaces one-hot Z-based atom type encoding with Subatomic Tokenization, representing each element 0 by a compact, chemically structured descriptor 1. This encoding concatenates:
- one-hot period (7D)
- one-hot group (19D)
- one-hot block (4D)
- normalized ground state valence occupancies 2
Features are standardized and weighted, then optionally projected via PCA to dimension 3 (typically 4). For crystals with atoms 5,
6
Decoding the atom type from a latent token is performed by cosine similarity matching against the dictionary. This continuous, chemistry-aware embedding both reduces model size (from up to 89 one-hot to 16 dimensions) and encodes chemical similarity directly into the atom latent space, facilitating diffusion-based generative modeling and mitigating memorization artifacts for frequent compositions (Veljković et al., 2 Apr 2026).
3. Geometry Enhancement Module (GEM)
GEM introduces periodic crystal geometry into the multi-head self-attention by computing a head-wise geometric bias matrix 7 for every atom pair 8. The module computes minimum-image displacements 9 on the periodic torus using the metric tensor 0, where 1 reconstructs the lattice from 2. The normalized Cartesian distance 3 and edge features are inputs to two bias terms per attention head:
- 4, 5
- 6
Both branches are scaled by a noise-dependent gate 7 and summed: 8 This additive bias is included in the attention logits: 9 Thus, the attention mechanism is directly responsive to periodic atom-pair geometry while maintaining standard Transformer operations and 0 computational complexity. GEM efficiently encodes key periodic and geometric relations fundamental to crystalline materials (Veljković et al., 2 Apr 2026).
4. Training, Sampling Efficiency, and Baseline Comparison
Crystalite comprises 1 parameters in a 14-layer Transformer with 16 heads (hidden width 2), trained in bfloat16 for 3 million steps (batch 128, AdamW, learning rate 4, EMA 5). During sampling, 1 000 crystals require 6 s (or 7 s with FlashAttention and bfloat16) on H100 GPU hardware, greatly exceeding the efficiency of previous geometry-heavy baselines:
- MatterGen: 8 s/1k
- FlowMM: 9 s/1k
- DiffCSP: 0 s/1k
- CrystalDiT: 1 s/1k
- Crystalite: 2 s/1k (3 s optimized)
This increased efficiency is attributed to the architectural simplicity, Subatomic Tokenization, and lightweight geometry bias of GEM, avoiding high-cost tensor equivariance and per-step message passing (Veljković et al., 2 Apr 2026).
5. Empirical Results on Crystal Structure Prediction and Generation
On three CSP benchmarks—MP-20, MPTS-52, and Alex-MP-20—Crystalite achieves state-of-the-art results in both match rate (MR) and geometric RMSE. For example:
| Benchmark | Best Prior MR / RMSE | Crystalite MR / RMSE (Å) |
|---|---|---|
| MP-20 | KLDM: 65.8% / 0.0517 | 66.1% / 0.0329 |
| MPTS-52 | KLDM: 23.9% / 0.1276 | 31.5% / 0.0701 |
| Alex-MP-20 | OMatG: 64.7% / 0.1251 | 67.5% / 0.0335 |
In de novo generation of 10 000 crystals, Crystalite attains the highest S.U.N. discovery score (48.6%), best density Wasserstein distance (0.046), and fastest generation rate (22.4 s, 5.1 s optimized per 1 k samples), outperforming all evaluated alternatives (Veljković et al., 2 Apr 2026).
6. Ablation Analyses and Inductive Bias Effects
Ablation studies indicate that GEM accelerates stability learning (raising mid-training stability by ~10% and S.U.N. by ~5%) and reduces geometric RMSE by ~20% in CSP, with minimal impact on match rate. Subatomic Tokenization is argued to be essential for avoiding memorization of high-frequency compositions and explicitly embedding chemical similarities but is not quantitated directly in ablation tables. This suggests that both inductive biases contribute non-trivially to sample efficiency and output quality, especially in distributional robustness and geometric fidelity (Veljković et al., 2 Apr 2026).
7. Limitations and Prospects
Limitations of the current Crystalite formulation include reliance on a fixed Niggli cell convention (no basis augmentation), single-sample CSP setting, and external machine-learned interatomic potential (MLIP) for stability evaluation. Potential directions for extension are multi-sample Bayesian CSP, integrated cell-basis equivariance or Wyckoff symmetry conditioning, joint end-to-end training with stability predictors, and scaling to large systems via sparse attention or memory-optimized GEM computation.
This summary is based on information and results reported in "Crystalite: A Lightweight Transformer for Efficient Crystal Modeling" (Veljković et al., 2 Apr 2026).