Likelihood-Free Temperature Sampling

Updated 4 November 2025

Likelihood-free temperature sampling is a family of methods that leverages temperature parameters to explore complex, multimodal distributions without computing explicit likelihoods or partition functions.
Flow-based techniques like temperature-annealed Boltzmann generators and temperature-steerable flows use annealing and importance reweighting to overcome high-dimensional barriers and improve sampling efficiency.
These approaches empower applications in statistical physics, Bayesian inference, and generative modeling by reducing computational cost and enhancing exploration of energy landscapes.

Likelihood-free temperature sampling encompasses a family of strategies for sampling, optimizing, and performing statistical inference over distributions parameterized by temperature (or analogous annealing parameters) without requiring explicit computation of likelihoods, partition functions, or equilibrium configurations, and with broad applications in statistical physics, Bayesian inference, generative modeling, and simulation-based inference. The unifying theme is leveraging temperature (or generalized control parameters) to control distributional smoothness and connectivity, thereby improving sampling, posterior exploration, or density estimation, while sidestepping key computational bottlenecks of traditional likelihood-based approaches.

1. Mathematical Foundations and Motivation

Let $p_X(x; T) \propto \exp(-E(x)/k_B T)$ define a Boltzmann-type density, or more generally, a member of a temperature-parameterized exponential family. Direct sampling or inference over such $p_X(x; T)$ , especially in the presence of multimodal, high-dimensional landscapes, faces major challenges:

Intractable partition function $Z(T)$ : $p_X(x; T)$ is typically unnormalized.
Equilibrium training data is often unavailable or computationally prohibitive to generate.
Traditional MCMC and MD approaches exhibit poor mixing and slow barrier crossing at low $T$ .

Likelihood-free temperature sampling approaches circumvent explicit likelihood/partition evaluation (or training on equilibrium data) via techniques including stochastic annealing, flow-based models with temperature parametrization, marginal-auxiliary tempering, temperature-extended nested sampling, and buffer-based reweighting or parametrization.

2. Flow-based Methods for Likelihood-Free Temperature Sampling

2.1 Temperature-Annealed Boltzmann Generators (TA-BG)

TA-BG (Schopmans et al., 31 Jan 2025) exemplifies likelihood-free temperature sampling with normalizing flows for molecular equilibrium distributions:

Phase 1: Reverse KL training at a high temperature $T_{\text{start}}$ . The flow $x = g(z; \theta)$ is fit to $p_X(x; T_{\text{start}})$ using:

$\mathrm{KL}_\theta[q_X \| p_X] = C - \mathbb{E}_{z \sim q_Z}\left[\log p_X(g(z;\theta)) + \log|\det J|\right]$

This high-temperature regime prevents mode collapse and ensures ergodic coverage of phase space.

Phase 2: Iterative, buffered annealing to the target $T_{\text{target}}$ $T_{target}$ :
- At each step, sample from the flow at $T_i$ , compute importance weights,
$w(x) = \frac{p_X(x; T_{i+1})}{q_X(x; \theta)}$

and resample a buffer according to $w(x)$ . - Fine-tune the flow at $T_{i+1}$ by maximum likelihood (forward KL) on the resampled buffer.
Final: Optionally fine-tune at the target $T_{\text{target}}$ with a final round of reweighting.

Key advantages:

No need for equilibrium data or explicit partition function. Training is likelihood-free with respect to the normalizing constant, relying solely on energy evaluations.
Order-of-magnitude reduction in target energy evaluations and substantially improved metastable state resolution compared to MCMC, MD, and baseline flow methods.
**Effective reweighting enables unbiased computation of expectation values over $p_X(x; T)$ via importance sampling.

2.2 Temperature-Steerable Flows (TSF)

TSF (Dibak et al., 2020, Dibak et al., 2021) extends normalizing flows for families of densities, parameterized by $\tau = k_B T$ :

Prior parameterization: $p_Z^\tau(z) = \mathcal{N}(z | 0, \tau)$ aligns the Gaussian prior to the temperature.
Flow parameterization: The mapping $f_\tau(z)$ and its Jacobian obey a temperature scaling condition:

$p_X^{\tau'}(x) \propto [p_X^{\tau}(x)]^{\kappa}, \quad \kappa = \tau/\tau'$

Satisfying this, TSFs allow exact temperature transfer across $\tau$ , enabling: - Fast, non-equilibrium generation at arbitrary $T$ without retraining; - Embedding in parallel tempering frameworks with direct swaps across $T$ ; - Likelihood-free, unbiased sampling via importance (Zwanzig) reweighting or latent-space MCMC to correct for imperfections.

TSFs demonstrably outperform retrained RealNVP or standard flows in cross-temperature generalization and barrier crossings in many-body systems.

2.3 Temporal Score Rescaling (TSR)

TSR (Xu et al., 1 Oct 2025) targets diffusion and flow-matching generative models:

The method rescales the learned score function at each time step to adjust the local sampling "temperature," without retraining or changing the sampling protocol.
For Gaussian (or local Gaussian mixture) data,

$\tilde{s}_\theta(x, t) = r_t(k, \sigma) s_\theta(x, t), \quad r_t(k, \sigma) = \frac{\eta_t \sigma^2 + 1}{\eta_t \frac{\sigma^2}{k} + 1}$

This plug-and-play approach yields true (local) temperature scaling across deterministic and stochastic samplers and maintains multi-modality robustly, in contrast to CNS or classifier-free guidance.

TSR systematically tunes diversity and sharpness, substantially improving task-specific metrics (FID, CLIP, depth error, pose accuracy, designability) without costly retraining.

3. Annealing, Variational, and Bayesian Approaches

3.1 Variational Tempering (VT)

VT (Mandt et al., 2014) embeds temperature as a latent variable in variational inference for exponential family models:

Deterministic annealing can be recast via a softened ELBO with fixed or learned temperature:

$L_A(\lambda, \phi; T) = \mathbb{E}_q[\log p(\beta | \alpha)] - \mathbb{E}_q[\log q(\beta | \lambda)] + \sum_{i=1}^{N} \left( \frac{1}{T} \mathbb{E}_q[\log p(x_i, z_i | \beta)] - \mathbb{E}_q[\log q(z_i | \phi_i)] \right)$

VT treats the temperature as a global discrete latent variable, inferring a distribution over $T$ , thus learning an annealing schedule from data rather than prescribing one.
Local VT allows for a vector of temperatures, providing robustness to local nonconjugacy and outlier data.

Estimation of partition functions $C(T)$ is carried out via analytic or MC methods, retaining computational feasibility for large data. VT (and especially local VT) demonstrably improves held-out log-likelihoods and model robustness.

3.2 Bayesian Neural Networks at Finite Temperature

Posterior distributions of neural networks can be generalized to: $\mathrm{prob}(w|T, D) \propto \exp\left(-\frac{1}{T} E(D|w)\right) \mathrm{prob}(w)$ Varying $T$ systematically explores the trade-off between data fit and entropy (posterior width):

Empirically optimal $T^*$ is model- and data-dependent, rarely $T=1$ ; performance (test error) is minimized at nontrivial $T$ .
Replica-exchange HMC (parallel tempering) efficiently samples the $T$ -deformed posteriors.
Thermodynamic integration in $T$ provides model evidence without requiring complicated Hessian computations.

This framework is operationally likelihood-free as only energy differences are needed, never explicit partition functions.

4. Tempered Population and Sequential Monte Carlo Techniques

4.1 Likelihood-free parallel and simulated tempering

Within Approximate Bayesian Computation (ABC) (Baragatti et al., 2011), parallel tempering is adapted into the likelihood-free context:

A set of chains is maintained at increasing tolerance levels ( $\varepsilon$ ), which play the role of "inverse temperature".
Exchange moves between chains allow effective transfer of parameter states, dramatically improving exploration and reducing autocorrelation compared to ABC-MCMC.
The exchange move is likelihood-free, depending only on simulations and tolerances.

Population-based and sequence-based ABC tempering generalizes to more aggressive bias-variance control and is specifically advantageous for multi-modal or tail-heavy posteriors.

4.2 Rao-Blackwellized Tempered Sampling

RTS (Carlson et al., 2016) leverages the multinomial law of inverse temperatures in simulated tempering to efficiently estimate partition functions: $\hat{Z}_k^{\text{RTS}} = \hat{Z}_k \frac{r_1}{r_k} \frac{\hat{c}_k}{\hat{c}_1}$ where $\hat{c}_k = \frac{1}{N} \sum_{i=1}^N q(\beta_k|x^{(i)})$ . This unbiased, memory-efficient estimator outperforms standard AIS, especially for RBMs and large-scale models.

4.3 Continuous Tempering in PDMPs

PDMP samplers (Sutton et al., 2022) such as Zig-Zag can be extended with a joint continuous temperature distribution: $\omega(d\bm{x}, d\beta) \propto q(\bm{x},\beta)\, (1-\alpha)\kappa(\beta) + q(\bm{x})\,\alpha\kappa(1)\,\delta_{\beta=1}$ Enables efficient exploration of multimodal posteriors, and—importantly—exact posterior samples are generated when the process is at $\beta=1$ .

5. Likelihood-Free Inference with Simulator Models

Distilled importance sampling (DIS) with normalizing flows (Prangle et al., 2019) introduces a temperature-parameterized family of tempered posteriors (via kernel bandwidth $\epsilon$ ), progressively refining a flow proposal to approximate the true posterior:

Each iteration draws augmented simulator parameter samples, computes likelihood-free importance weights with the current $p_\epsilon$ , and distills these into a new flow.
Annealing $\epsilon$ corresponds to temperature annealing for improved accuracy without summary statistics, especially in high-dimensional or intractable simulator models.

This iterative, flow-based scheme offers superior flexibility for arbitrary simulators without the intractable likelihood.

6. Thermodynamic Integration and Marginal Likelihood Estimation

Continuous temperature-parameterized MCMC (simulated/parallel tempering) has been extended (Stojkova et al., 2019) to avoid discrete temperature schedules and normalization constants by defining a profile-uniform prior over the temperature, resulting in efficient marginal likelihood (evidence) estimation via thermodynamic integration: $\log P(\mathbf{Y}) = \int_0^1 \mathbb{E}_{\theta|Y,\tau}[\log P(\mathbf{Y}|\theta)] d\tau$ This avoids explicit estimation of $z(\mathbf{Y}|\tau)$ at each $\tau$ , allows a continuous temperature path, and yields efficient model selection.

7. Application Domains

Molecular and Statistical Physics

Flow-based and umbrella/metadynamics temperature-annealed methods (TA-BG (Schopmans et al., 31 Jan 2025), TSF (Dibak et al., 2020, Dibak et al., 2021), TASS (Awasthi et al., 2016)) have demonstrated superior performance (in accuracy, barrier crossing, reduced evaluations, high-dimensional efficiency) for protein, peptide, and spin models.
Parallel tempering and nested sampling with temperature augmentation (Maillard et al., 2 Sep 2025) enable efficient thermodynamic estimation across broad $T$ ranges, including quantum simulations with temperature-dependent potentials.

Machine Learning and Bayesian Inference

Variational tempering and its local extension (Mandt et al., 2014) are particularly effective for large-scale latent-variable models (LDA, factorial mixtures) and robust inference.
Bayesian neural network generalization, model evidence, and uncertainty quantification benefit directly from temperature sampling and continuous annealing (Baldock et al., 2019, Cecere et al., 25 Feb 2025).
Flow-based and importance weighted inference methods (Prangle et al., 2019) bring likelihood-free temperature sampling to high-dimensional, black-box simulators, outperforming ABC in flexibility and accuracy.

LLMs and Generative Models

Temperature-controlled sampling in LLMs offers not only diversity control but also improved test-time scaling and uncertainty quantification via voting or Monte Carlo temperature sampling (Wu et al., 2 Oct 2025, Cecere et al., 25 Feb 2025).
Selective sampling frameworks (Troshin et al., 20 Sep 2025) and temperature/local score rescaling for diffusion models (Xu et al., 1 Oct 2025) further extend these methods to new architectures and modalities.

8. Advantages, Limitations, and Perspectives

Advantages

Approach/class	No partition function	No retraining	Adaptive T schedule	Arbitrary T sampling	Handles multimodality	Model selection
TA-BG/TSF/TSR	Yes	Yes	Yes	Yes	Yes	No
VT/LVT	Yes (approx/analytic)	Yes	Yes	Discrete grid	Some (nonconvex VI)	No
Population ABC-PT	Yes	Not relevant	Yes (auto mixing)	Yes (tolerance grid)	Yes	No
Sim. Tempering (NF)	Yes	Yes	Yes	Yes	Yes	Yes (TI)
BNN via RE-HMC	Yes	Yes	Yes (exchange)	Yes	Yes	Yes (TI)

Limitations

Complete elimination of "mode collapse" can depend on flow expressivity and adequacy of annealing or sampling schedule.
Partition function or temperature normalization estimation may be implicitly required (VT, TI) but typically in low dimension and efficiently.
Reweighting or buffer-based learning may introduce variance or require importance clipping/stabilization.
Some approaches (e.g., local VT, per-token LLM temperature) may introduce complexity in estimation or risk overfitting with insufficient data.
In high-dimensional parameter or latent spaces, expressivity of NF models must be ensured to capture all relevant structure.

9. Summary

Likelihood-free temperature sampling provides a unified conceptual and computational framework for efficiently exploring, sampling, and performing inference in complex multimodal distributions parameterized by temperature—across statistical physics, Bayesian deep learning, simulation-based inference, and generative modeling. Methods based on temperature-annealed flows, temperature steerability, continuous or population-based tempering in MCMC, and variational/annealed latent-variable inference achieve superior coverage of configuration or parameter spaces, exceptional efficiency (orders-of-magnitude reduction in computation), robustness to multimodality and outliers, and enable powerful downstream analysis such as free energy reconstruction, marginal likelihood computation, and optimal diversity-quality tradeoff in LLMs and diffusion models. The field continues to expand, with cross-pollination into new architectures and inference paradigms.