CrystalBoltz: Bayesian Protein Structure

Updated 22 May 2026

CrystalBoltz is a generative framework that integrates a diffusion-based Boltz-2 prior with experimental X-ray crystallography data via Bayesian inference.
It employs differentiable, experiment-guided sampling to lower RMSD and R-factors while achieving a 33× speed-up relative to traditional refinement pipelines.
The method automates local refinement of atomic coordinates and B-factors, ensuring compliance with crystallographic quality measures such as R-work and R-free.

CrystalBoltz is a generative framework for end-to-end protein structure determination from X-ray crystallography data, implementing Bayesian inference over atomic coordinates with direct conditioning on measured structure-factor amplitudes. Unlike traditional workflows, which rely on sequential manual refinement and limited integration of experimental data into generative models, CrystalBoltz unites a powerful diffusion-based prior (Boltz-2) with differentiable, experiment-guided sampling and automated refinement. The methodology achieves lower coordinate RMSD, lower R-factors, and significantly reduced turnaround time relative to previous state-of-the-art experimentally guided refinement pipelines (Kim et al., 15 May 2026).

1. Bayesian Formulation for Crystallography

CrystalBoltz frames protein structure determination as Bayesian inference of atomic coordinates $X$ (and isotropic B-factors $B$ ) given observed structure-factor amplitudes $|F_o|$ . The sequence- and space-group-conditioned posterior is

$p(X\mid a,c,y) \propto p(y\mid X, a, c)\,p(X\mid a)$

where:

$a$ is the amino-acid sequence,
$c = (u, \mathcal{G})$ encodes unit cell and space group,
$y = |F_o|$ are observed amplitudes,
$p(X \mid a)$ is the learned Boltz-2 prior,
$p(y \mid X, a, c)$ is a differentiable likelihood provided by a structure-factor forward model.

This formalism unifies generative priors with experimental evidence, enabling direct sampling of plausible atomic models consistent with measured diffraction data.

2. Learned Generative Prior: Boltz-2 Diffusion Model

The generative prior, termed Boltz-2, is a sequence-conditioned denoising diffusion model over atomic coordinates, architecturally derived from AlphaFold3’s denoising network and trained on millions of publicly deposited PDB structures. The model learns to predict the score function $\nabla_{X_t} \log p_t(X_t \mid a)$ under a variance-preserving stochastic differential equation:

$B$ 0

Key features include:

Diffusion/noise schedule with $B$ 1 steps and variance-preserving $B$ 2;
Training to remove Gaussian noise from true protein structures, conditioning on the full sequence and optionally MSA or templates;
Guaranteed physical plausibility in samples at $B$ 3 due to diffusion-based learning.

The Boltz-2 prior ensures that the output models are not only consistent with general biophysical constraints but also tailored to the specific target sequence.

3. Posterior Sampling Guided by Experimental Data

Posterior sampling is performed by converting the unconditional reverse SDE into a conditional SDE via addition of the crystallographic likelihood gradient:

$B$ 4

The intractable $B$ 5 is approximated by evaluating crystallographic losses on the denoiser’s one-step prediction $B$ 6:

Heteroscedastic Gaussian loss on normalized amplitudes ( $B$ 7-values),
Rice distribution loss to account for unknown phases (distinguishing acentric/centric reflections).

Combined guidance is expressed as:

$B$ 8

Empirically, guidance uses $B$ 9, $|F_o|$ 0, and $|F_o|$ 1. Each sampling step involves rigid-body alignment of $|F_o|$ 2 to a reference in the crystal frame, ensuring correct fractional coordinates during forward likelihood computation. This approach enables end-to-end generation of models that are simultaneously plausible under the learned prior and tightly consistent with measured diffraction data.

Upon completion of diffusion-guided sampling, CrystalBoltz enters a brief, local, high-resolution refinement of both atomic coordinates ( $|F_o|$ 3) and B-factors ( $|F_o|$ 4):

$|F_o|$ 5

The refinement objective $|F_o|$ 6 is typically the crystallographic $|F_o|$ 7-factor or correlation coefficient (CC):

$|F_o|$ 8

The forward model incorporates Debye–Waller factors $|F_o|$ 9 for B-factors, initialized from Boltz-2’s pLDDT via a Baek et al. mapping and clamped to $p(X\mid a,c,y) \propto p(y\mid X, a, c)\,p(X\mid a)$ 0 Å $p(X\mid a,c,y) \propto p(y\mid X, a, c)\,p(X\mid a)$ 1. The refinement uses Adam for $p(X\mid a,c,y) \propto p(y\mid X, a, c)\,p(X\mid a)$ 2 steps, periodically re-solving scale and solvent parameters. This phase corrects side-chain rotamers and B-factor distributions to reach crystallographic-quality R-factors efficiently.

5. Experimental Benchmarks and Computational Performance

CrystalBoltz was evaluated across six PDB single-chain proteins (resolutions 1.69–2.20 Å; 164–306 residues): 8DWN, 4NTZ, 7O51, 7SEZ, 7VNX, and 1L63. The system was implemented in PyTorch on NVIDIA RTX A6000 GPUs, utilizing custom code for differentiable structure–factor calculation and integrating the Boltz-2 denoiser from the AlphaFold3 codebase.

A direct comparison with ROCKET [Fadini et al., 2026] reveals:

Method	Total Runtime (min)	Key Steps
ROCKET	∼376	3× MSA opt, long fine-tune, phenix.refine
CrystalBoltz	11.3	10.9 (phase 1) + 0.4 (refine); 200 steps, 50–100 refine steps

CrystalBoltz realizes a 33.3× speed-up relative to existing experimentally guided pipelines, reducing structure determination from hours to approximately 11 minutes per target.

6. Quantitative Results and Comparative Evaluation

Performance metrics include all-atom RMSD, C $p(X\mid a,c,y) \propto p(y\mid X, a, c)\,p(X\mid a)$ 3 RMSD, $p(X\mid a,c,y) \propto p(y\mid X, a, c)\,p(X\mid a)$ 4, and $p(X\mid a,c,y) \propto p(y\mid X, a, c)\,p(X\mid a)$ 5. Table summarizing mean results (top 3 of 20 samples, values are improvements over best prior baseline):

PDB	RMSD (Å) ↓	$p(X\mid a,c,y) \propto p(y\mid X, a, c)\,p(X\mid a)$ 6 ↓
8DWN	2.20 → 1.32	0.382 → 0.337
4NTZ	8.77 → 1.30	0.554 → 0.483
7O51	1.125 → 0.651	0.381 → 0.278
7SEZ	2.127 → 1.014	0.451 → 0.365
7VNX	1.113 → 0.590	0.321 → 0.328
1L63	0.940 → 0.661	0.344 → 0.309

CrystalBoltz achieved statistically significant best performance on four of six proteins, with consistent improvements in both RMSD and R-factors compared to unguided Boltz-2 and ROCKET. This demonstrates the method’s effectiveness at directly integrating experimental data into generative structural workflows.

7. Limitations and Prospective Extensions

Current limitations include a reliance on rigid-body alignment to a reference from molecular replacement, introducing a dependency that could potentially be obviated by integrating alignment into inference. All experiments to date use single chains in the asymmetric unit, though both the Boltz-2 prior and forward model are inherently chain-agnostic, suggesting straightforward extensibility to complexes and oligomers.

The posterior sampling strategy employs diffusion posterior sampling (DPS) as a proof-of-concept; more advanced techniques (e.g., DAPS, dual-diffusion) could offer improved guidance, especially in high-noise or highly nonlinear regimes. The core experimental-conditioning paradigm—learned prior, differentiable forward model, guided diffusion, and local refinement—could be directly adapted to modalities such as cryo-EM and NMR.

As AI-refined models are deposited into public structure databases, explicit provenance metadata will be critical to support reproducibility and prevent feedback loops in model training.

CrystalBoltz demonstrates a highly integrated approach that unifies powerful data-driven priors with physics-based likelihoods, providing end-to-end sampling and refinement that simultaneously elevates accuracy and efficiency for X-ray crystallographic structure determination (Kim et al., 15 May 2026).

Markdown Report Issue Upgrade to Chat

References (1)

CrystalBoltz: End-to-End Protein Structure Determination via Experiment-Guided Diffusion for X-Ray Crystallography (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to CrystalBoltz.

CrystalBoltz: Bayesian Protein Structure

1. Bayesian Formulation for Crystallography

2. Learned Generative Prior: Boltz-2 Diffusion Model

3. Posterior Sampling Guided by Experimental Data

4. Atomic Coordinate and B-Factor Refinement

5. Experimental Benchmarks and Computational Performance

6. Quantitative Results and Comparative Evaluation

7. Limitations and Prospective Extensions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

CrystalBoltz: Bayesian Protein Structure

1. Bayesian Formulation for Crystallography

2. Learned Generative Prior: Boltz-2 Diffusion Model

3. Posterior Sampling Guided by Experimental Data

4. Atomic Coordinate and B-Factor Refinement

5. Experimental Benchmarks and Computational Performance

6. Quantitative Results and Comparative Evaluation

7. Limitations and Prospective Extensions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research