CrystaLLM-π: Property Injection for Crystals

Updated 28 November 2025

CrystaLLM-π is a transformer model that injects continuous property information directly into attention mechanisms to generate and recover crystal structures.
It employs innovative architectures like PKV prefix and residual attention to achieve superior performance in structure recovery and forward materials design.
The method utilizes symmetry-aware tokenization and dual-objective training, ensuring robust structure-property correlations and effective inverse design.

CrystaLLM-π denotes a property-injection methodology for transformer-based generative models of crystal structures, enabling conditional generation and recovery of crystalline materials with explicit control over physicochemical properties. The core premise is the direct integration of continuous property information—such as band gap, density, or high-dimensional X-ray diffraction (XRD) descriptors—into the model’s attention mechanism or input context. This approach targets the inverse problem of discovering or reconstructing crystal structures that satisfy specified functional constraints, achieving state-of-the-art performance in both structure recovery from experimental data and forward materials design (Wang et al., 27 Aug 2025, Bone et al., 26 Nov 2025).

1. Foundations of Property Injection in Generative Crystal Models

Property injection in CrystaLLM-π addresses the fundamental limitation of token-level conditioning for continuous or high-dimensional property targets within autoregressive transformers trained on crystallographic data. Traditional digit-token or prompt-append methods fail to robustly link target properties to the complex, symmetry-governed output space of crystal structures. CrystaLLM-π solves this by projecting the property vector into the transformer’s attention computation itself, establishing persistent, differentiable structure-property correlations throughout all network layers (Bone et al., 26 Nov 2025).

This is in contrast to prior LLM approaches for crystal generation, which appended property information to textual prompts but did not exploit architectural mechanisms to bind properties and structure generation in a principled manner (Wang et al., 27 Aug 2025). The method is applicable to inverse design—generating crystal candidates with prescribed functional properties—and to recovery tasks, such as reconstructing atomic arrangements from incomplete or noisy measurement data (e.g., XRD patterns).

2. Property Embedding and Architectural Integration

Each crystalline material is associated with a real-valued condition vector $c \in \mathbb{R}^p$ , representing scalar or high-dimensional targets (e.g., formation energy, band gap, XRD features). A dedicated Condition Encoder, implemented as a small multi-layer perceptron (MLP) with layer normalization and nonlinearity, transforms $c$ into a learned embedding tensor for all transformer layers, accommodating the dimensions required for property-key-value (PKV) augmentation (Bone et al., 26 Nov 2025).

Two principal architectural variants are introduced:

PKV Prefix Attention: The encoded property embedding is split into key and value components and concatenated to the token-derived keys and values at each decoder block. For head $i$ in block $\ell$ , attention is computed over both the regular sequence and the $p$ property-augmented “ghost” tokens:

$K = \mathrm{concat}[K_\text{PKV}; K_\text{seq}], \quad V = \mathrm{concat}[V_\text{PKV}; V_\text{seq}]$

$A_\text{out}(Q_\text{seq}, K, V) = \operatorname{softmax}\left(\frac{Q_\text{seq} \cdot K^\top}{\sqrt{d_k}}\right)V$

PKV Residual Attention: A base attention output and a residual property attention are linearly interpolated via a learned scalar $\alpha_\ell$ per layer (initially zero), permitting controlled and gradual property injection during fine-tuning:

$A_\text{out} = A_\text{base} + \alpha_\ell A_\text{res}$

where $A_\text{base}$ is computed from regular tokens and $A_\text{res}$ from the property keys/values.

This bypasses “digit-token” brittleness, mitigates catastrophic forgetting, and keeps pre-trained crystallographic knowledge intact.

3. Tokenization and Task Templates

CrystaLLM-π employs symmetry-aware tokenization based on the SGS (Space-Group + Wyckoff Site) scheme to represent crystals efficiently. This approach emits:

The space-group symbol as one token.
Lattice parameters (a, b, c, α, β, γ) as six numerical tokens.
One element symbol and fractional coordinate triple per unique Wyckoff site.

Such tokenization leverages group-theoretical constraints, drastically reducing sequence length and focusing learning on symmetry-determined structural degrees of freedom.

Task templates are used for zero-shot and few-shot instruction:

Zero-shot: Natural-language prompts describing the target property, formula, and desired generation.
Few-shot: Interleaved demonstrations of K sample crystals with their property descriptions and structures, followed by a query for generation conditioned on an additional target property.

This hybrid approach instructs the model to infer structure-property mappings both from explicit condition statements and by analogy to demonstration exemplars (Wang et al., 27 Aug 2025).

4. Training Objectives and Fine-Tuning Strategy

The CrystaLLM-π training regime consists of:

Primary objective: Causal cross-entropy loss for next-token prediction over the crystallographic vocabulary.
Auxiliary property-prediction task: Fill-in-the-blank loss, where the model is tasked to predict the property $s_j$ from a prompt combining the structure string with a masked-out property token:

$Q^p_j = \left[\texttt{"The [Property] is [MASK]."}; [\text{CrysStr}_j] \right]$

$\mathcal{L}_\text{prop}(\theta) = \sum_{(Q^p_j, s_j)} -\log p_\theta(s_j | Q^p_j)$

Total loss: A weighted combination, $\mathcal{L}_\text{total} = \mathcal{L}_\text{gen} + \lambda \mathcal{L}_\text{prop}$ , with $\lambda=1$ in practice.

A dual learning rate scheme is used: backbone parameters (pre-trained on large corpora such as LeMaterial, 4.35M CIFs) are updated more conservatively than the conditioning layers, thus preserving structural priors while efficiently adapting to new property-conditioned tasks (Bone et al., 26 Nov 2025).

5. Performance Benchmarks and Quantitative Results

Performance is validated across several conditional generation and recovery tasks:

Task / Metric	Baseline	CrystaLLM-π (best)	Relative Change
3-shot Formation-energy match (MP20, SGS tokenization)	0.798	0.939	+17.6%
Band-gap match (MP20, SGS tokenization)	0.637	0.745	+17.0%
Structure recovery (MP-20 XRD, 1-shot perplexity)	PXRDGen: 68.68%	69.01% (w/o validity screen)	Comparable, fewer params.
Lattice MAE reduction vs DiffractGPT (Jarvis-DFT)	—	~50% reduction	—
Photovoltaic discovery: VSUN (unique, stable / 100k)	—	1,809	—

Key qualitative results include implicit learning of property distributions matching physical optima (e.g., predicted band gaps peaking at 1.2–1.4 eV without explicit band-gap supervision) and DFT validation of generated candidates with low formation energies and high predicted solar efficiency. In structure recovery, increased match rate and precision versus token-only models are observed, even with significantly fewer parameters (Wang et al., 27 Aug 2025, Bone et al., 26 Nov 2025).

6. Robustness, Sampling, and Ablation Studies

Ablation studies assessed the effect of conditioning architectures and pre-training:

Pre-training improves validity and uniqueness, with property-conditioned models outperforming unconditioned or digit-token-conditioned counterparts by 20.4% in validity and 14.4% in VSUN.
Prefix attention provides optimal steering accuracy when data are plentiful (e.g., large photovoltaic datasets), while residual attention offers smoother optimization and higher match rates in data-scarce settings (Fig. A.14 of (Bone et al., 26 Nov 2025)).
Sampling optimized for temperature $T=0.75$ and perplexity-based ranking; top-perplexity selection outperforms naïve or first-shot sampling.
Limitations include reduced performance for large or disordered unit cells (limited by transformer context), strong bias toward well-represented structures in the pretraining set, and an inability to model disorder or partial occupancy in current implementations.

7. Future Directions and Extensions

Anticipated advancements for CrystaLLM-π include:

Scaling transformer backbones and deploying advanced positional encodings (such as RoFormer and long-range attention) to accommodate larger and more complex crystal units.
Extending conditioning to support partial occupancies, disorder, and dopants by fine-tuning on appropriate datasets.
Incorporating multi-modal targets (e.g., electron microscopy, Raman spectra) as conditioning vectors.
Investigating interpretability of structure-property dependencies, e.g., via attention probe techniques visualizing how injected properties steer internal representations.
Addressing training set distributional bias to promote generalization beyond densely sampled regions (current structure recovery hit-rate scales with data density, $r=0.78$ ).
A plausible implication is that architectural advances in property injection could facilitate broad unification of inverse design and structure elucidation tasks across materials domains.

CrystaLLM-π thus establishes a generalized, property-injected transformer paradigm—via attention-level conditioning, symmetry-aware representation, and hybrid instruction/fill-in-the-blank objectives—for robust, scalable, and controllable modeling of crystallographic structure-property landscapes (Wang et al., 27 Aug 2025, Bone et al., 26 Nov 2025).