Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 71 tok/s

Gemini 2.5 Pro 52 tok/s Pro

GPT-5 Medium 18 tok/s Pro

GPT-5 High 15 tok/s Pro

GPT-4o 101 tok/s Pro

Kimi K2 196 tok/s Pro

GPT OSS 120B 467 tok/s Pro

Claude Sonnet 4 37 tok/s Pro

2000 character limit reached

Progressive Inference-Time Annealing of Diffusion Models for Sampling from Boltzmann Densities (2506.16471v1)

Published 19 Jun 2025 in cs.LG and cs.AI

Abstract: Sampling efficiently from a target unnormalized probability density remains a core challenge, with relevance across countless high-impact scientific applications. A promising approach towards this challenge is the design of amortized samplers that borrow key ideas, such as probability path design, from state-of-the-art generative diffusion models. However, all existing diffusion-based samplers remain unable to draw samples from distributions at the scale of even simple molecular systems. In this paper, we propose Progressive Inference-Time Annealing (PITA), a novel framework to learn diffusion-based samplers that combines two complementary interpolation techniques: I.) Annealing of the Boltzmann distribution and II.) Diffusion smoothing. PITA trains a sequence of diffusion models from high to low temperatures by sequentially training each model at progressively higher temperatures, leveraging engineered easy access to samples of the temperature-annealed target density. In the subsequent step, PITA enables simulating the trained diffusion model to procure training samples at a lower temperature for the next diffusion model through inference-time annealing using a novel Feynman-Kac PDE combined with Sequential Monte Carlo. Empirically, PITA enables, for the first time, equilibrium sampling of N-body particle systems, Alanine Dipeptide, and tripeptides in Cartesian coordinates with dramatically lower energy function evaluations. Code available at: https://github.com/taraak/pita

Collections

Summary

The paper introduces PITA that combines temperature annealing and diffusion smoothing to efficiently overcome sampling challenges in rugged Boltzmann distributions.
It leverages a staged training approach with Feynman-Kac SMC dynamics to gradually transition from high to low temperatures.
Experimentally, PITA substantially improves sample quality and efficiency in molecular systems like LJ-13 and peptide conformations compared to traditional methods.

Progressive Inference-Time Annealing of Diffusion Models for Sampling from Boltzmann Densities

This paper presents Progressive Inference-Time Annealing (PITA), a framework that addresses the challenge of sampling from complex, unnormalized Boltzmann distributions—an issue central to computational chemistry and statistical physics. The method focuses on combining two orthogonal probabilistic interpolation schemes: temperature annealing of the target Boltzmann density and diffusion-based smoothing commonly exploited in generative models. This synergy is leveraged to build an efficient, scalable amortized sampler capable of tackling high-dimensional, highly multi-modal density landscapes prevalent in $N$ -body particle systems and molecular simulations in Cartesian coordinates.

Problem Setting and Motivation

Sampling from the Boltzmann distribution at low temperatures is notoriously difficult due to the landscape’s ruggedness: energy barriers impede mixing and limit feasible data collection via direct molecular dynamics (MD) or Markov Chain Monte Carlo (MCMC). While classical approaches like parallel tempering or Sequential Monte Carlo exploit interpolating sequences of densities for enhanced exploration, these can suffer from mode teleportation and computationally prohibitive time scales for realistic systems. Diffusion models, which rely on learned Stein scores across a noising–denoising path, have demonstrated limited efficacy in this domain—primarily because they require too many expensive energy function evaluations and lack ground-truth score supervision in the absence of data.

PITA Framework

PITA integrates temperature annealing and diffusion smoothing within a staged training and sampling approach:

High-Temperature Initialization:
- Start with a high-temperature (low $\beta$ ) Boltzmann density, where the target distribution is smoother and easier to sample from using classical techniques (e.g., short, parallelized MCMC chains).
- Use these samples to train the first-stage diffusion model at the current (high) temperature.
Progressive Annealing and Model Chaining:
- After training at temperature $\beta_i$ , use a Feynman-Kac-based inference-time SMC procedure to progressively anneal samples to a slightly lower temperature $\beta_{i+1} > \beta_i$ .
- This SMC-based inference anneals not only the endpoint of the distribution but also the entire time-marginal path of the diffusion model, maintaining smooth transition and low-variance sample reweighting.
- The annealed sample set then serves as data to train a new diffusion model at $\beta_{i+1}$ .
Iteration Until Target Temperature:
- Repeat this procedure until the final, low-temperature target is reached, chaining together diffusion models trained at progressively lower temperatures.

Algorithmic Architecture:

At each temperature step, jointly train:
- A score model $s_t(x;\theta) \approx \nabla \log p_t(x)$ using denoising score matching (DSM) and, for low-noise regimes, target score matching objectives.
- An energy-based model $U_t(x;\eta)$ regressed from the score model, using an energy distillation loss to stabilize SNIS-based reweighting in the Feynman-Kac SMC procedure.
Feynman-Kac SDE/SMC dynamics are parameterized so that sample weights become constant in the ideal case of perfect score and energy models (mitigating resampling collapse during bridging steps).
Sample weighting and resampling (adapted from SMC theory) are employed iteratively at each stage to account for the density ratio changes induced by temperature reduction.

for beta in decreasing_temperature_schedule:
    if first_iteration:
        data = run_high_temp_mcmc(beta)
    else:
        data = smc_feynman_kac_anneal(previous_model, previous_data, beta_prev, beta)
    model = train_diffusion_model(data, beta)
    previous_model, previous_data, beta_prev = model, data, beta

This chaining regime allows the amortized sampler to bypass energy barriers by mixing via temperature annealing and to avoid mass shifting/teleportation through learned diffusion smoothing.

Experimental Results and Numerical Claims

The empirical evaluation focuses on three domains:

LJ-13: 13-particle Lennard-Jones system.
Alanine Dipeptide and Tripeptide: canonical benchmarks in molecular conformation, in full Cartesian coordinates.

Key results include:

Significantly improved sample quality and mode coverage over all baselines, including classical importance sampling, parallel tempering, and diffusion models trained directly on MD data.
For LJ-13, PITA achieves a 2-Wasserstein distance of $0.04 \pm 0.00$ for interatomic distances and $2.26 \pm 0.21$ for energies, far outperforming competing samplers.
For peptides, PITA produces conformational ensembles capturing both slow kinetic modes (as seen in TICA plots) and accurate energy distributions. Notably, other diffusion-based samplers fail to scale or collapse to dominant modes at low temperature.

The energy evaluation budget required by PITA is drastically lower than for direct diffusion or MD-based sampling; buffer reuse and staged learning amortize the cost across intermediate models.

Practical Implications

PITA's framework has direct practical utility in fields requiring principled equilibrium sampling where one cannot rely on long, well-mixed MD chains due to computational cost or limited exploration at low temperature—for example:

Protein folding and conformational landscape exploration.
Materials design via atomistic simulation.
Bayesian inference in physical models with intractable likelihoods.

Beyond domain-specific use, PITA demonstrates a general approach for leveraging learned probabilistic models where direct data access is only feasible for "easier" (smoothed/annealed) distributions, introducing an efficient scheme for amortized importance path learning.

Key Implementation Considerations:

Energy-based model training must be carefully regularized and stabilized for the SNIS estimator to remain effective and avoid degeneracy.
Choice of temperature steps is critical—too aggressive annealing leads to low effective sample size; PITA’s sequential bridging mitigates this.
Architecture and preconditioning (e.g., using DiT backbones, parameterized with temperature) are crucial for molecular data, as seen in the strong empirical performance.
Computational requirements are dominated by energy evaluations during SMC and initial sampling; subsequent training is relatively cheap, which is advantageous for repeated inference/evaluation.

Limitations and Future Directions

The main limitations stem from the necessity of robust EBM training and careful temperature schedule design. Additionally, simultaneous optimization of score and energy models increases computational and memory overhead. Estimating optimal temperature scheduling or adaptive annealing strategies represents a promising avenue for further progress.

Looking ahead, PITA opens promising directions for scalable, black-box equilibrium sampling in high-dimensional scientific domains, with potential for adaption to broader classes of distributions and integration with other amortized inference techniques.