Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 28 tok/s Pro
GPT-5 High 24 tok/s Pro
GPT-4o 65 tok/s Pro
Kimi K2 186 tok/s Pro
GPT OSS 120B 439 tok/s Pro
Claude Sonnet 4.5 33 tok/s Pro
2000 character limit reached

Likelihood-Free Temperature Sampling

Updated 4 November 2025
  • Likelihood-free temperature sampling is a family of methods that leverages temperature parameters to explore complex, multimodal distributions without computing explicit likelihoods or partition functions.
  • Flow-based techniques like temperature-annealed Boltzmann generators and temperature-steerable flows use annealing and importance reweighting to overcome high-dimensional barriers and improve sampling efficiency.
  • These approaches empower applications in statistical physics, Bayesian inference, and generative modeling by reducing computational cost and enhancing exploration of energy landscapes.

Likelihood-free temperature sampling encompasses a family of strategies for sampling, optimizing, and performing statistical inference over distributions parameterized by temperature (or analogous annealing parameters) without requiring explicit computation of likelihoods, partition functions, or equilibrium configurations, and with broad applications in statistical physics, Bayesian inference, generative modeling, and simulation-based inference. The unifying theme is leveraging temperature (or generalized control parameters) to control distributional smoothness and connectivity, thereby improving sampling, posterior exploration, or density estimation, while sidestepping key computational bottlenecks of traditional likelihood-based approaches.

1. Mathematical Foundations and Motivation

Let pX(x;T)exp(E(x)/kBT)p_X(x; T) \propto \exp(-E(x)/k_B T) define a Boltzmann-type density, or more generally, a member of a temperature-parameterized exponential family. Direct sampling or inference over such pX(x;T)p_X(x; T), especially in the presence of multimodal, high-dimensional landscapes, faces major challenges:

  • Intractable partition function Z(T)Z(T): pX(x;T)p_X(x; T) is typically unnormalized.
  • Equilibrium training data is often unavailable or computationally prohibitive to generate.
  • Traditional MCMC and MD approaches exhibit poor mixing and slow barrier crossing at low TT.

Likelihood-free temperature sampling approaches circumvent explicit likelihood/partition evaluation (or training on equilibrium data) via techniques including stochastic annealing, flow-based models with temperature parametrization, marginal-auxiliary tempering, temperature-extended nested sampling, and buffer-based reweighting or parametrization.

2. Flow-based Methods for Likelihood-Free Temperature Sampling

2.1 Temperature-Annealed Boltzmann Generators (TA-BG)

TA-BG (Schopmans et al., 31 Jan 2025) exemplifies likelihood-free temperature sampling with normalizing flows for molecular equilibrium distributions:

  • Phase 1: Reverse KL training at a high temperature TstartT_{\text{start}}. The flow x=g(z;θ)x = g(z; \theta) is fit to pX(x;Tstart)p_X(x; T_{\text{start}}) using:

KLθ[qXpX]=CEzqZ[logpX(g(z;θ))+logdetJ]\mathrm{KL}_\theta[q_X \| p_X] = C - \mathbb{E}_{z \sim q_Z}\left[\log p_X(g(z;\theta)) + \log|\det J|\right]

This high-temperature regime prevents mode collapse and ensures ergodic coverage of phase space.

  • Phase 2: Iterative, buffered annealing to the target TtargetT_{\text{target}}:

    • At each step, sample from the flow at TiT_i, compute importance weights,

    w(x)=pX(x;Ti+1)qX(x;θ)w(x) = \frac{p_X(x; T_{i+1})}{q_X(x; \theta)}

    and resample a buffer according to w(x)w(x). - Fine-tune the flow at Ti+1T_{i+1} by maximum likelihood (forward KL) on the resampled buffer.

  • Final: Optionally fine-tune at the target TtargetT_{\text{target}} with a final round of reweighting.

Key advantages:

  • No need for equilibrium data or explicit partition function. Training is likelihood-free with respect to the normalizing constant, relying solely on energy evaluations.
  • Order-of-magnitude reduction in target energy evaluations and substantially improved metastable state resolution compared to MCMC, MD, and baseline flow methods.
  • **Effective reweighting enables unbiased computation of expectation values over pX(x;T)p_X(x; T) via importance sampling.

2.2 Temperature-Steerable Flows (TSF)

TSF (Dibak et al., 2020, Dibak et al., 2021) extends normalizing flows for families of densities, parameterized by τ=kBT\tau = k_B T:

  • Prior parameterization: pZτ(z)=N(z0,τ)p_Z^\tau(z) = \mathcal{N}(z | 0, \tau) aligns the Gaussian prior to the temperature.
  • Flow parameterization: The mapping fτ(z)f_\tau(z) and its Jacobian obey a temperature scaling condition:

pXτ(x)[pXτ(x)]κ,κ=τ/τp_X^{\tau'}(x) \propto [p_X^{\tau}(x)]^{\kappa}, \quad \kappa = \tau/\tau'

Satisfying this, TSFs allow exact temperature transfer across τ\tau, enabling: - Fast, non-equilibrium generation at arbitrary TT without retraining; - Embedding in parallel tempering frameworks with direct swaps across TT; - Likelihood-free, unbiased sampling via importance (Zwanzig) reweighting or latent-space MCMC to correct for imperfections.

TSFs demonstrably outperform retrained RealNVP or standard flows in cross-temperature generalization and barrier crossings in many-body systems.

2.3 Temporal Score Rescaling (TSR)

TSR (Xu et al., 1 Oct 2025) targets diffusion and flow-matching generative models:

  • The method rescales the learned score function at each time step to adjust the local sampling "temperature," without retraining or changing the sampling protocol.
  • For Gaussian (or local Gaussian mixture) data,

s~θ(x,t)=rt(k,σ)sθ(x,t),rt(k,σ)=ηtσ2+1ηtσ2k+1\tilde{s}_\theta(x, t) = r_t(k, \sigma) s_\theta(x, t), \quad r_t(k, \sigma) = \frac{\eta_t \sigma^2 + 1}{\eta_t \frac{\sigma^2}{k} + 1}

  • This plug-and-play approach yields true (local) temperature scaling across deterministic and stochastic samplers and maintains multi-modality robustly, in contrast to CNS or classifier-free guidance.

TSR systematically tunes diversity and sharpness, substantially improving task-specific metrics (FID, CLIP, depth error, pose accuracy, designability) without costly retraining.

3. Annealing, Variational, and Bayesian Approaches

3.1 Variational Tempering (VT)

VT (Mandt et al., 2014) embeds temperature as a latent variable in variational inference for exponential family models:

  • Deterministic annealing can be recast via a softened ELBO with fixed or learned temperature:

LA(λ,ϕ;T)=Eq[logp(βα)]Eq[logq(βλ)]+i=1N(1TEq[logp(xi,ziβ)]Eq[logq(ziϕi)])L_A(\lambda, \phi; T) = \mathbb{E}_q[\log p(\beta | \alpha)] - \mathbb{E}_q[\log q(\beta | \lambda)] + \sum_{i=1}^{N} \left( \frac{1}{T} \mathbb{E}_q[\log p(x_i, z_i | \beta)] - \mathbb{E}_q[\log q(z_i | \phi_i)] \right)

  • VT treats the temperature as a global discrete latent variable, inferring a distribution over TT, thus learning an annealing schedule from data rather than prescribing one.
  • Local VT allows for a vector of temperatures, providing robustness to local nonconjugacy and outlier data.

Estimation of partition functions C(T)C(T) is carried out via analytic or MC methods, retaining computational feasibility for large data. VT (and especially local VT) demonstrably improves held-out log-likelihoods and model robustness.

3.2 Bayesian Neural Networks at Finite Temperature

Posterior distributions of neural networks can be generalized to: prob(wT,D)exp(1TE(Dw))prob(w)\mathrm{prob}(w|T, D) \propto \exp\left(-\frac{1}{T} E(D|w)\right) \mathrm{prob}(w) Varying TT systematically explores the trade-off between data fit and entropy (posterior width):

  • Empirically optimal TT^* is model- and data-dependent, rarely T=1T=1; performance (test error) is minimized at nontrivial TT.
  • Replica-exchange HMC (parallel tempering) efficiently samples the TT-deformed posteriors.
  • Thermodynamic integration in TT provides model evidence without requiring complicated Hessian computations.

This framework is operationally likelihood-free as only energy differences are needed, never explicit partition functions.

4. Tempered Population and Sequential Monte Carlo Techniques

4.1 Likelihood-free parallel and simulated tempering

Within Approximate Bayesian Computation (ABC) (Baragatti et al., 2011), parallel tempering is adapted into the likelihood-free context:

  • A set of chains is maintained at increasing tolerance levels (ε\varepsilon), which play the role of "inverse temperature".
  • Exchange moves between chains allow effective transfer of parameter states, dramatically improving exploration and reducing autocorrelation compared to ABC-MCMC.
  • The exchange move is likelihood-free, depending only on simulations and tolerances.

Population-based and sequence-based ABC tempering generalizes to more aggressive bias-variance control and is specifically advantageous for multi-modal or tail-heavy posteriors.

4.2 Rao-Blackwellized Tempered Sampling

RTS (Carlson et al., 2016) leverages the multinomial law of inverse temperatures in simulated tempering to efficiently estimate partition functions: Z^kRTS=Z^kr1rkc^kc^1\hat{Z}_k^{\text{RTS}} = \hat{Z}_k \frac{r_1}{r_k} \frac{\hat{c}_k}{\hat{c}_1} where c^k=1Ni=1Nq(βkx(i))\hat{c}_k = \frac{1}{N} \sum_{i=1}^N q(\beta_k|x^{(i)}). This unbiased, memory-efficient estimator outperforms standard AIS, especially for RBMs and large-scale models.

4.3 Continuous Tempering in PDMPs

PDMP samplers (Sutton et al., 2022) such as Zig-Zag can be extended with a joint continuous temperature distribution: ω(dx,dβ)q(x,β)(1α)κ(β)+q(x)ακ(1)δβ=1\omega(d\bm{x}, d\beta) \propto q(\bm{x},\beta)\, (1-\alpha)\kappa(\beta) + q(\bm{x})\,\alpha\kappa(1)\,\delta_{\beta=1} Enables efficient exploration of multimodal posteriors, and—importantly—exact posterior samples are generated when the process is at β=1\beta=1.

5. Likelihood-Free Inference with Simulator Models

Distilled importance sampling (DIS) with normalizing flows (Prangle et al., 2019) introduces a temperature-parameterized family of tempered posteriors (via kernel bandwidth ϵ\epsilon), progressively refining a flow proposal to approximate the true posterior:

  • Each iteration draws augmented simulator parameter samples, computes likelihood-free importance weights with the current pϵp_\epsilon, and distills these into a new flow.
  • Annealing ϵ\epsilon corresponds to temperature annealing for improved accuracy without summary statistics, especially in high-dimensional or intractable simulator models.

This iterative, flow-based scheme offers superior flexibility for arbitrary simulators without the intractable likelihood.

6. Thermodynamic Integration and Marginal Likelihood Estimation

Continuous temperature-parameterized MCMC (simulated/parallel tempering) has been extended (Stojkova et al., 2019) to avoid discrete temperature schedules and normalization constants by defining a profile-uniform prior over the temperature, resulting in efficient marginal likelihood (evidence) estimation via thermodynamic integration: logP(Y)=01EθY,τ[logP(Yθ)]dτ\log P(\mathbf{Y}) = \int_0^1 \mathbb{E}_{\theta|Y,\tau}[\log P(\mathbf{Y}|\theta)] d\tau This avoids explicit estimation of z(Yτ)z(\mathbf{Y}|\tau) at each τ\tau, allows a continuous temperature path, and yields efficient model selection.

7. Application Domains

Molecular and Statistical Physics

  • Flow-based and umbrella/metadynamics temperature-annealed methods (TA-BG (Schopmans et al., 31 Jan 2025), TSF (Dibak et al., 2020, Dibak et al., 2021), TASS (Awasthi et al., 2016)) have demonstrated superior performance (in accuracy, barrier crossing, reduced evaluations, high-dimensional efficiency) for protein, peptide, and spin models.
  • Parallel tempering and nested sampling with temperature augmentation (Maillard et al., 2 Sep 2025) enable efficient thermodynamic estimation across broad TT ranges, including quantum simulations with temperature-dependent potentials.

Machine Learning and Bayesian Inference

  • Variational tempering and its local extension (Mandt et al., 2014) are particularly effective for large-scale latent-variable models (LDA, factorial mixtures) and robust inference.
  • Bayesian neural network generalization, model evidence, and uncertainty quantification benefit directly from temperature sampling and continuous annealing (Baldock et al., 2019, Cecere et al., 25 Feb 2025).
  • Flow-based and importance weighted inference methods (Prangle et al., 2019) bring likelihood-free temperature sampling to high-dimensional, black-box simulators, outperforming ABC in flexibility and accuracy.

LLMs and Generative Models

8. Advantages, Limitations, and Perspectives

Advantages

Approach/class No partition function No retraining Adaptive T schedule Arbitrary T sampling Handles multimodality Model selection
TA-BG/TSF/TSR Yes Yes Yes Yes Yes No
VT/LVT Yes (approx/analytic) Yes Yes Discrete grid Some (nonconvex VI) No
Population ABC-PT Yes Not relevant Yes (auto mixing) Yes (tolerance grid) Yes No
Sim. Tempering (NF) Yes Yes Yes Yes Yes Yes (TI)
BNN via RE-HMC Yes Yes Yes (exchange) Yes Yes Yes (TI)

Limitations

  • Complete elimination of "mode collapse" can depend on flow expressivity and adequacy of annealing or sampling schedule.
  • Partition function or temperature normalization estimation may be implicitly required (VT, TI) but typically in low dimension and efficiently.
  • Reweighting or buffer-based learning may introduce variance or require importance clipping/stabilization.
  • Some approaches (e.g., local VT, per-token LLM temperature) may introduce complexity in estimation or risk overfitting with insufficient data.
  • In high-dimensional parameter or latent spaces, expressivity of NF models must be ensured to capture all relevant structure.

9. Summary

Likelihood-free temperature sampling provides a unified conceptual and computational framework for efficiently exploring, sampling, and performing inference in complex multimodal distributions parameterized by temperature—across statistical physics, Bayesian deep learning, simulation-based inference, and generative modeling. Methods based on temperature-annealed flows, temperature steerability, continuous or population-based tempering in MCMC, and variational/annealed latent-variable inference achieve superior coverage of configuration or parameter spaces, exceptional efficiency (orders-of-magnitude reduction in computation), robustness to multimodality and outliers, and enable powerful downstream analysis such as free energy reconstruction, marginal likelihood computation, and optimal diversity-quality tradeoff in LLMs and diffusion models. The field continues to expand, with cross-pollination into new architectures and inference paradigms.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Likelihood-Free Temperature Sampling.