Boltz-2: Joint Structure & Binding Prediction

Updated 22 May 2026

Boltz-2 is a deep learning foundation model that integrates protein-ligand co-folding and binding affinity estimation using a three-stage transformer architecture.
It employs a diffusion-based structure module and an affinity module to generate 3D coordinates and quantitatively predict binding energies in a single framework.
Boltz-2 is widely used in virtual screening and drug discovery pipelines, offering improvements over traditional docking methods while presenting computational challenges.

Boltz-2 is a foundation model for joint structure prediction and binding affinity estimation in biomolecular systems, integrating advances in protein–ligand “co-folding” and physics-motivated deep learning. Developed as a second-generation system building upon AlphaFold3 and earlier Boltz models, Boltz-2 introduces a three-stage transformer-based neural architecture for simultaneous protein and ligand structure prediction and binding affinity regression. It has rapidly become a standard benchmarking tool for structure-based drug discovery, virtual screening, and protein–ligand interaction inference, and has been adapted for protein–protein affinity prediction and several hybrid screening pipelines.

1. Conceptual Foundation and Model Architecture

Boltz-2 is designed to address the limitations of traditional docking and scoring workflows, which decouple protein structure prediction from ligand binding pose and affinity estimation. Boltz-2’s core innovation is its end-to-end “co-folding” paradigm, where protein backbone folding, side-chain positioning, and small-molecule ligand docking are solved jointly within a single deep-learning framework. The architecture consists of three principal modules:

Trunk Module (“PairFormer + MSA”): Encodes both protein (sequence, MSA features) and ligand (2D graph, atom-, charge-level features) inputs into a learned, stacked pairwise representation $P_{ij}$ . The trunk recycles outputs for multiple (typically five) iterations, refining the joint latent feature space by interleaving transformer-style updates (“PairFormer” blocks) with standard MSA attention on the protein side (Furui et al., 24 Aug 2025).
Structure Module: A diffusion-based coordinate predictor, closely related to the generative mechanisms in AlphaFold3 and other latent-time diffusion models (Sha, 9 May 2026). It generates 3D coordinates for all heavy atoms by progressive denoising of noisy initializations, informed by learned models of geometric and chemical interaction.
Affinity Module: Specialized regression head, operating on the final pairwise representations and explicit 3D coordinates, to output (a) a scalar binding likelihood ( $s_\text{bind}\in[0,1]$ ) and (b) a regressed binding free energy or affinity ( $\Delta G_\text{pred}$ , e.g., in kcal/mol or log $_{10}$ IC $_{50}$ units) (Furui et al., 24 Aug 2025, Wan et al., 2 Mar 2026).

These components are trained jointly using a combination of regression and classification losses. The diffusion-based structure module is rate-limiting in computational cost, typically requiring ~15–20 s per complex on modern GPUs, even with only five trunk recycles (Furui et al., 24 Aug 2025, Elton et al., 5 May 2026).

2. Mathematical Formalism and Training Objectives

Boltz-2 leverages the denoising score-matching principles of latent-time diffusion models to generate plausible protein–ligand complex structures (Sha, 9 May 2026). The forward process ( $q(x_t|x_{t-1})$ ) incrementally corrupts atomic coordinates by Gaussian noise parameterized by a schedule $\{\beta_t\}$ , while the learned reverse process ( $p_\theta(x_{t-1}|x_t)$ ) is modeled as a Gaussian with mean and variance predicted by the neural network. The learning objective is to accurately regress the underlying noise:

$L(\theta) = \mathbb{E}\Bigl\|\epsilon-\epsilon_\theta\bigl(\sqrt{\bar\alpha_t}\,x_0+\sqrt{1-\bar\alpha_t}\,\epsilon,\;t\bigr)\Bigr\|_2^2,$

where $\epsilon$ is Gaussian noise, $s_\text{bind}\in[0,1]$ 0 is the true structure, and $s_\text{bind}\in[0,1]$ 1 defines the noise schedule (Sha, 9 May 2026).

For binding affinity, the model combines a binary cross-entropy loss for binder classification and an MSE loss over measured affinities (expressed as $s_\text{bind}\in[0,1]$ 2, $s_\text{bind}\in[0,1]$ 3, or log IC $s_\text{bind}\in[0,1]$ 4):

$s_\text{bind}\in[0,1]$ 5

where $s_\text{bind}\in[0,1]$ 6, $s_\text{bind}\in[0,1]$ 7 (Furui et al., 24 Aug 2025, Wan et al., 2 Mar 2026).

Training data for Boltz-2 aggregates over one million experimental structures and ten million labeled binding measurements from public sources including BindingDB, ChEMBL, and PubChem (Wan et al., 2 Mar 2026, Elton et al., 5 May 2026).

3. Structure Generation, Sampling, and Inference

Boltz-2 offers both stochastic (ancestral) and continuous (ODE-based) sampling of 3D coordinates:

Ancestral sampling (Euler–Maruyama): Starts from a Gaussian prior and iterates backward through the noise schedule using the network-predicted mean and variance to denoise at each step.
Probability-flow ODE: Integrates the deterministic ODE representation of the diffusion process, yielding a single (rather than a probabilistic ensemble) output (Sha, 9 May 2026).
Inference workflow: Feature extraction (MSA computation, ligand graph prep), trunk recycling, structure module sampling, and affinity prediction are sequenced; the structure module dominates computational time, contributing ~75% of GPU runtime (Furui et al., 24 Aug 2025).

The model supports single-instance inference (batch size 1 in the original implementation) and, in derivative pipelines like Boltzina, batching via omission of the structure sampler (Furui et al., 24 Aug 2025).

4. Performance Benchmarks and Evaluation

Boltz-2 has been benchmarked on diverse protein–ligand and protein–protein datasets:

Model Variant	RMSE (kcal/mol)	Spearman ρ	ROC-AUC	Throughput (s/mol)
Boltz-2 (full)	1.26–1.71	0.28–0.68	~0.81	16–24 (H100/RTX3060)
Boltzina (fast)	—	—	~0.75	1.4–2.3 (H100, batched)
DrugFormDTA (FT)	1.19 (pK)	0.701	—	0.03

Data aggregated from (Thaler et al., 26 Aug 2025, Furui et al., 24 Aug 2025, Elton et al., 5 May 2026)

Key findings:

On the MF-PCBA virtual screening assays, Boltz-2 achieves substantially higher average precision and enrichment factor than GNINA or Vina, with mean AP = 0.084 and ROC-AUC ~0.81 (Furui et al., 24 Aug 2025).
On curated antiviral targets, Boltz-2’s Pearson r = 0.316 and RMSE = 1.59 pK, outperforming classical docking (GNINA r = 0.302, RMSE = 1.60 pK) but below the best fine-tuned neural models (DrugFormDTA r = 0.701) (Elton et al., 5 May 2026).
In ABFE (absolute binding free energy) calculations, Boltz-2-derived starting structures robustly seed MD-based FEP workflows without requiring crystal structures, with RMSE on par with or only modestly exceeding crystal-structure-initialized FEP for many targets (Thaler et al., 26 Aug 2025).
For protein–protein affinity (Boltz-2-PPI), the structure-based head underperforms sequence-only models but yields complementary gains when combined with sequence-based embeddings (King et al., 6 Dec 2025).

5. Integration into Downstream Computational Pipelines

Boltz-2’s outputs directly seed both virtual and physical simulation workflows:

Virtual screening: Rank-ordering and triage of large ligand libraries by predicted ΔG or –log IC $s_\text{bind}\in[0,1]$ 8, frequently as the first pass in two-stage workflows (Boltzina, Rhizome OS-1), where initial screening is performed by efficient surrogate models, followed by Boltz-2 rescoring (Furui et al., 24 Aug 2025, Wang et al., 8 Apr 2026).
Physics-based simulation setup: Generation of clash-free, physically plausible 3D protein–ligand structures for input into MD-based ABFE and ESMACS calculations. Boltz-2 structures, when passed through additional preparation steps (e.g., Spruce pipeline, POSIT re-docking), yield high rates of simulation-quality starting models (Thaler et al., 26 Aug 2025, Wan et al., 2 Mar 2026).
Agent-based design systems: Used as a “physics-informed scoring” layer in adaptive inverse-design platforms, e.g., Rhizome OS-1, with per-target calibration against ChEMBL affording reliable binder prioritization (Spearman ρ=–0.53 to –0.64, ROC AUC=0.88–0.93) (Wang et al., 8 Apr 2026).

6. Limitations, Sensitivity Analyses, and Future Directions

Boltz-2’s primary limitations arise from the compression of affinity ranges, inconsistent multi-head outputs, and structural pose uncertainty:

Weak to moderate global correlation with physics-based free-energy calculations (e.g., ESMACS r = 0.24–0.45) and substantially lower overlap among top-100 ranked compounds, reflecting lack of energetic resolution for lead optimization (Wan et al., 2 Mar 2026).
Limitations of generalization to out-of-distribution targets, insensitivity to subtle binding-site mutations, and inability to resolve activity cliffs in SAR (Elton et al., 5 May 2026).
Highly resource-intensive sampling (20–90 s/mol), restricting throughput unless the structure module is omitted or approximated (Boltzina, batched inference).

Ablation studies indicate that disabling “Boltz steering”—the application of physics-inspired potentials during diffusion—reduces structure quality in binding pockets and modestly degrades predictive accuracy on kinases (Elton et al., 5 May 2026). Ongoing work is directed at (a) reducing runtime through structural approximation, (b) incorporating explicit solvation and water networks, (c) improving the affinity prediction head through few-shot fine-tuning on specific targets, and (d) developing “mini-Boltz” surrogates for ultra-large scale library pre-screening (Elton et al., 5 May 2026, Furui et al., 24 Aug 2025).

7. Applications and Impact in Drug Discovery

Boltz-2 is now used in early-stage drug discovery campaigns for target-agnostic virtual screening, protein–ligand docking, and as a preprocessor for structure-based simulations devoid of X-ray data. Its ability to robustly generate simulation-quality structures from sequence and ligand input expands the applicability of structure-based affinity estimation to targets for which crystal structures are unavailable (Thaler et al., 26 Aug 2025, Wan et al., 2 Mar 2026). Real-world use includes antiviral repurposing, oncology lead optimization, and protein–protein interaction engineering (Elton et al., 5 May 2026, King et al., 6 Dec 2025, Sha, 9 May 2026).

In summary, Boltz-2 represents a significant technical unification of deep-learned structure prediction and physical affinity estimation, providing a flexible foundation for modern structure-based design and computational screening pipelines, with empirical performance at or above traditional docking and notable competitive strength among ML-based binding predictors. Future work will focus on accuracy, physical expressiveness, cost/performance optimization, and deeper integration with downstream simulation platforms.