Temperature-Annealed Boltzmann Generators

Updated 27 January 2026

Temperature-Annealed Boltzmann Generators (TA-BG) are generative models that use temperature-controlled invertible flows and annealing protocols to efficiently sample high-dimensional Boltzmann distributions.
TA-BG mitigates common challenges like mode collapse and poor mixing through importance reweighting, continuous-flow methods, and adaptive annealing techniques.
TA-BG has demonstrated superior performance in free energy estimation and kinetic inference with fewer energy evaluations compared to traditional simulation methods.

A Temperature-Annealed Boltzmann Generator (TA-BG) is a generative modeling framework that synthesizes equilibrium samples from a Boltzmann distribution at arbitrary temperatures using deep invertible neural networks, typically normalizing flows, trained with explicit control over temperature variables. TA-BGs systematically address the challenge of mode collapse and poor mixing in high-dimensional, multimodal energy landscapes by integrating temperature-conditioned generative models with principled annealing and reweighting protocols. This approach enables highly efficient, unbiased sampling, free energy estimation, and kinetic inference across a spectrum of thermodynamic states, with applications ranging from molecular simulation to machine learning.

1. Theoretical Foundation: Temperature-Steerable Flows

TA-BGs center on the construction of invertible mappings $f_T$ that transform a tractable base distribution (such as a Gaussian with $T$ -dependent variance or a fixed uniform prior) to the Boltzmann target at a chosen temperature $T$ , $\pi_T(x) \propto \exp[-U(x)/T]$ . Two principal forms have been developed:

(A) Gaussian Prior, Volume-Preserving Flow: $z \sim \mathcal{N}(0, TI)$ and a flow $f_T(z)$ with $\big|\det \partial f_T/\partial z\big|=1$ . The resulting density $p_X^T(x)$ exactly captures the temperature scaling of the Boltzmann distribution via the change in prior variance.
(B) Uniform Prior, Temperature-Conditioned Flow: A fixed $p_Z(z)=1$ on $[0,1]^d$ and a neural spline flow with parameters scaled in proportion to $T$ 0. Here, all $T$ 1-dependence is absorbed by the conditioning of flow parameters, enabling $T$ 2 to interpolate between densities at different temperatures (Dibak et al., 2021, Dibak et al., 2020).

The loss function combines maximum likelihood (ML) on data at each $T$ 3 and the reverse Kullback-Leibler (KL) divergence—termed "energy-based loss"—which encourages the model to approach the target Boltzmann density: $T$ 4 where $T$ 5 interpolates between ML and KL objectives (Dibak et al., 2021).

2. Annealing Protocols and Training Algorithms

Temperature annealing in a TA-BG is implemented by incrementally transforming a model trained at a high $T$ 6 (broad, connected probability landscape) to the target $T$ 7 (potentially rugged, multimodal). The principal methods are:

Stepwise Annealing with Importance Reweighting: Training initially at $T$ 8 ( $T$ 9), drawing samples, and then applying normalized importance weights:

$T$ 0

This forms a buffered dataset for ML or forward-KL training at each $T$ 1. The process iterates until the target temperature is reached, followed by fine-tuning at $T$ 2 (Schopmans et al., 31 Jan 2025).

Constraint-Driven Schedules: Recent advances (e.g., Constrained Mass Transport, CMT) learn the intermediate inverse temperatures $T$ 3 dynamically by imposing explicit constraints on the KL divergence and entropy decay between successive distributions. This procedure yields an optimized path that ensures overlap between transformations, substantially improving effective sample size (ESS) and mode coverage over manual geometric schedules (Klitzing et al., 21 Oct 2025).
Continuous-Flow Approaches: Methods such as Thermodynamic Interpolation (TI) utilize continuous normalizing flows (CNFs), training via stochastic-interpolant regression losses to interpolate directly between temperatures $T$ 4 and $T$ 5 over a continuous temperature axis. This enables generalization to both interpolated and extrapolated $T$ 6 values, supporting high-fidelity equilibrium and kinetic statistics generation via a single model (Moqvist et al., 2024).

3. Model Architectures and Conditioning

TA-BGs employ a range of architectural motifs to represent temperature-parameterized transformations:

Explicit Conditioning: Temperature (or $T$ 7) is concatenated to the inputs or parameters of every coupling block in RealNVP, NICE, NSF, or transformer-based normalizing flows. This facilitates explicit $T$ 8-steerability (Dibak et al., 2021, Dibak et al., 2020).
Permutational and Symmetry Constraints: In scenarios involving indistinguishable particles or spatial symmetry (e.g., solid-liquid coexistence, molecular clusters), the flow is designed to be permutation-equivariant and/or SE(3)-equivariant, often using transformer or message-passing mechanisms (Schebek et al., 2024, Moqvist et al., 2024, Dern et al., 3 Sep 2025).
Base Measure Adaptation: Latent priors are assigned $T$ 9-dependent width, such as $\pi_T(x) \propto \exp[-U(x)/T]$ 0, ensuring that the volume transformation closely matches the target at every $\pi_T(x) \propto \exp[-U(x)/T]$ 1 (Dibak et al., 2020, Dibak et al., 2021).

4. Sampling Procedures and Reweighting

Sampling from a TA-BG at desired $\pi_T(x) \propto \exp[-U(x)/T]$ 2 proceeds by:

Drawing $\pi_T(x) \propto \exp[-U(x)/T]$ 3 from the base prior (with $\pi_T(x) \propto \exp[-U(x)/T]$ 4-dependent covariance if required)
Passing $\pi_T(x) \propto \exp[-U(x)/T]$ 5 through $\pi_T(x) \propto \exp[-U(x)/T]$ 6 to obtain $\pi_T(x) \propto \exp[-U(x)/T]$ 7
Assigning an importance weight:

$\pi_T(x) \propto \exp[-U(x)/T]$ 8

to yield unbiased observables via importance sampling, regardless of whether $\pi_T(x) \propto \exp[-U(x)/T]$ 9 matches the true Boltzmann target (Dibak et al., 2021, Schopmans et al., 31 Jan 2025).

TA-BGs are further deployed as proposal engines in hybrid MCMC or generalized-ensemble (e.g., parallel tempering) frameworks, with the flow used for quasi-global moves at each replica's temperature, and swaps accepted according to standard detailed-balance criteria (Dibak et al., 2021, Dibak et al., 2020).

5. Empirical Performance, Evaluation, and Benchmarks

TA-BGs have demonstrated marked advantages in empirical studies:

System	Method	Energy Evals (↓)	NLL (↓)	ESS (↑)	Ram KLD (↓)	Ram KLD w. RW (↓)
Dipeptide	FAB	$z \sim \mathcal{N}(0, TI)$ 0	$z \sim \mathcal{N}(0, TI)$ 1	$z \sim \mathcal{N}(0, TI)$ 2	$z \sim \mathcal{N}(0, TI)$ 3e-3	$z \sim \mathcal{N}(0, TI)$ 4e-3
	TA-BG	$z \sim \mathcal{N}(0, TI)$ 5	$z \sim \mathcal{N}(0, TI)$ 6	$z \sim \mathcal{N}(0, TI)$ 7	$z \sim \mathcal{N}(0, TI)$ 8e-3	$z \sim \mathcal{N}(0, TI)$ 9e-3
Tetrapeptide	FAB	$f_T(z)$ 0	$f_T(z)$ 1	$f_T(z)$ 2	$f_T(z)$ 3e-3	$f_T(z)$ 4e-3
	TA-BG	$f_T(z)$ 5	$f_T(z)$ 6	$f_T(z)$ 7	$f_T(z)$ 8e-3	$f_T(z)$ 9e-3
Hexapeptide	FAB	$\big\|\det \partial f_T/\partial z\big\|=1$ 0	$\big\|\det \partial f_T/\partial z\big\|=1$ 1	$\big\|\det \partial f_T/\partial z\big\|=1$ 2	$\big\|\det \partial f_T/\partial z\big\|=1$ 3e-2	$\big\|\det \partial f_T/\partial z\big\|=1$ 4e-2
	TA-BG	$\big\|\det \partial f_T/\partial z\big\|=1$ 5	$\big\|\det \partial f_T/\partial z\big\|=1$ 6	$\big\|\det \partial f_T/\partial z\big\|=1$ 7	$\big\|\det \partial f_T/\partial z\big\|=1$ 8e-3	$\big\|\det \partial f_T/\partial z\big\|=1$ 9e-3

TA-BG achieves comparable or superior negative log-likelihood (NLL) and effective sample size (ESS) to flow-annealed bootstrap (FAB), with up to threefold fewer energy evaluations (Schopmans et al., 31 Jan 2025). Only TA-BG resolves all metastable states in high-dimensional hexapeptide landscapes without collapse.

Evaluation is further supported by:

Effective sample size (ESS) derived from autocorrelation analysis
Free energy differences computed via Zwanzig estimators, TFEP, or Bennett’s acceptance ratio (BAR), with TA-BG and related TI methods yielding matches to reference MD within $p_X^T(x)$ 0– $p_X^T(x)$ 1 (Moqvist et al., 2024)
Kinetic rates and relaxation times via generator extended dynamic mode decomposition (gEDMD) using samples from the generator at arbitrary $p_X^T(x)$ 2—recovering Arrhenius-like kinetics in unseen temperature regimes (Moqvist et al., 2024)

6. Extensions and Alternative Realizations

Beyond vanilla deep-learning flows, alternate architectures enable temperature annealing:

Quantum- and Analog-Inspired Annealers: Systems such as SimCIM (numerical quantum-inspired annealer) and hardware-driven diabatic quantum annealing (DQA) replicate Boltzmann sampling at adjustable temperatures using physical or simulated network dynamics. Analytic relations allow exact control or calibration of the output temperature by adjusting time-dependent parameters (pump rates, annealing times), with temperature-annealed samples used in training generative models (e.g., RBM, fully-connected BMs) (Ulanov et al., 2019, Kim et al., 11 Sep 2025).
Energy-Only Annealed CNFs: Energy-Weighted Flow Matching (EWFM) and its annealed variant (aEWFM) extend TA-BGs by training CNFs entirely via energy evaluations using importance-sampled regression over a progressively cooled temperature ladder. aEWFM has demonstrated up to $p_X^T(x)$ 3 reductions in required energy evaluations over previous energy-only methods while maintaining or exceeding sample quality metrics (NLL, Wasserstein distance) on hard many-body systems (Dern et al., 3 Sep 2025).
Conditional and Phase-Diagram Flows: Thermodynamic variables, including pressure, are conditioned into the flow, enabling TA-BGs to generate full phase diagrams (e.g., Lennard-Jones solid–liquid coexistence). This approach reliably yields >60% Kish ESS and melting-temperature predictions accurate to $p_X^T(x)$ 4 (dimensionless LJ units), with a 5-fold reduction in total energy evaluations compared to MD+MBAR baselines (Schebek et al., 2024).

7. Implementation, Diagnostics, and Best Practices

Standard implementation guidelines for TA-BGs include:

Architecture: Typical flow depth is $p_X^T(x)$ 5– $p_X^T(x)$ 6 coupling layers (toy), $p_X^T(x)$ 7– $p_X^T(x)$ 8 (molecular), employing MLPs with 128–256 hidden units (ReLU/tanh).
Conditioning: $p_X^T(x)$ 9 (or $p_Z(z)=1$ 0) is injected either as scalar or positional embedding in all conditioner nets.
Optimization: Adam with learning rate $p_Z(z)=1$ 1e–4, weight-decay $p_Z(z)=1$ 2e–6; initial ML-only stage followed by gradual $p_Z(z)=1$ 3 ramp to combined ML/KL loss, batch sizes of 512–1024 (Dibak et al., 2021).
Annealing Schedule: Geometric spacing in $p_Z(z)=1$ 4 is favored for parallel tempering; for CMT and related methods, constraints determine adaptive schedule.
Diagnostics: Monitor swap rates in PT ( $p_Z(z)=1$ 520–40%), histogram overlap, marginal distribution matching, and effective sample size.
Sampling: Flows are combined with short-burst MCMC or used in importance-weighted estimation for unbiased statistics.

These practices underpin robust, scalable training and sampling across a wide range of systems in molecular sciences, statistical physics, and energy-based machine learning.

References:

"Temperature Steerable Flows and Boltzmann Generators" (Dibak et al., 2021)
"Temperature-steerable flows" (Dibak et al., 2020)
"Temperature-Annealed Boltzmann Generators" (Schopmans et al., 31 Jan 2025)
"Learning Boltzmann Generators via Constrained Mass Transport" (Klitzing et al., 21 Oct 2025)
"Thermodynamic Interpolation: A generative approach to molecular thermodynamics and kinetics" (Moqvist et al., 2024)
"Efficient mapping of phase diagrams with conditional Boltzmann Generators" (Schebek et al., 2024)
"Energy-Weighted Flow Matching: Unlocking Continuous Normalizing Flows for Efficient and Scalable Boltzmann Sampling" (Dern et al., 3 Sep 2025)
"Quantum-inspired annealers as Boltzmann generators for machine learning and statistical physics" (Ulanov et al., 2019)
"Diabatic quantum annealing for training energy-based generative models" (Kim et al., 11 Sep 2025)