- The paper introduces PITA that combines temperature annealing and diffusion smoothing to efficiently overcome sampling challenges in rugged Boltzmann distributions.
- It leverages a staged training approach with Feynman-Kac SMC dynamics to gradually transition from high to low temperatures.
- Experimentally, PITA substantially improves sample quality and efficiency in molecular systems like LJ-13 and peptide conformations compared to traditional methods.
Progressive Inference-Time Annealing of Diffusion Models for Sampling from Boltzmann Densities
This paper presents Progressive Inference-Time Annealing (PITA), a framework that addresses the challenge of sampling from complex, unnormalized Boltzmann distributions—an issue central to computational chemistry and statistical physics. The method focuses on combining two orthogonal probabilistic interpolation schemes: temperature annealing of the target Boltzmann density and diffusion-based smoothing commonly exploited in generative models. This synergy is leveraged to build an efficient, scalable amortized sampler capable of tackling high-dimensional, highly multi-modal density landscapes prevalent in N-body particle systems and molecular simulations in Cartesian coordinates.
Problem Setting and Motivation
Sampling from the Boltzmann distribution at low temperatures is notoriously difficult due to the landscape’s ruggedness: energy barriers impede mixing and limit feasible data collection via direct molecular dynamics (MD) or Markov Chain Monte Carlo (MCMC). While classical approaches like parallel tempering or Sequential Monte Carlo exploit interpolating sequences of densities for enhanced exploration, these can suffer from mode teleportation and computationally prohibitive time scales for realistic systems. Diffusion models, which rely on learned Stein scores across a noising–denoising path, have demonstrated limited efficacy in this domain—primarily because they require too many expensive energy function evaluations and lack ground-truth score supervision in the absence of data.
PITA Framework
PITA integrates temperature annealing and diffusion smoothing within a staged training and sampling approach:
- High-Temperature Initialization:
- Start with a high-temperature (low β) Boltzmann density, where the target distribution is smoother and easier to sample from using classical techniques (e.g., short, parallelized MCMC chains).
- Use these samples to train the first-stage diffusion model at the current (high) temperature.
- Progressive Annealing and Model Chaining:
- After training at temperature βi, use a Feynman-Kac-based inference-time SMC procedure to progressively anneal samples to a slightly lower temperature βi+1>βi.
- This SMC-based inference anneals not only the endpoint of the distribution but also the entire time-marginal path of the diffusion model, maintaining smooth transition and low-variance sample reweighting.
- The annealed sample set then serves as data to train a new diffusion model at βi+1.
- Iteration Until Target Temperature:
- Repeat this procedure until the final, low-temperature target is reached, chaining together diffusion models trained at progressively lower temperatures.
Algorithmic Architecture:
- At each temperature step, jointly train:
- A score model st(x;θ)≈∇logpt(x) using denoising score matching (DSM) and, for low-noise regimes, target score matching objectives.
- An energy-based model Ut(x;η) regressed from the score model, using an energy distillation loss to stabilize SNIS-based reweighting in the Feynman-Kac SMC procedure.
- Feynman-Kac SDE/SMC dynamics are parameterized so that sample weights become constant in the ideal case of perfect score and energy models (mitigating resampling collapse during bridging steps).
- Sample weighting and resampling (adapted from SMC theory) are employed iteratively at each stage to account for the density ratio changes induced by temperature reduction.
1
2
3
4
5
6
7
|
for beta in decreasing_temperature_schedule:
if first_iteration:
data = run_high_temp_mcmc(beta)
else:
data = smc_feynman_kac_anneal(previous_model, previous_data, beta_prev, beta)
model = train_diffusion_model(data, beta)
previous_model, previous_data, beta_prev = model, data, beta |
This chaining regime allows the amortized sampler to bypass energy barriers by mixing via temperature annealing and to avoid mass shifting/teleportation through learned diffusion smoothing.
Experimental Results and Numerical Claims
The empirical evaluation focuses on three domains:
- LJ-13: 13-particle Lennard-Jones system.
- Alanine Dipeptide and Tripeptide: canonical benchmarks in molecular conformation, in full Cartesian coordinates.
Key results include:
- Significantly improved sample quality and mode coverage over all baselines, including classical importance sampling, parallel tempering, and diffusion models trained directly on MD data.
- For LJ-13, PITA achieves a 2-Wasserstein distance of 0.04±0.00 for interatomic distances and 2.26±0.21 for energies, far outperforming competing samplers.
- For peptides, PITA produces conformational ensembles capturing both slow kinetic modes (as seen in TICA plots) and accurate energy distributions. Notably, other diffusion-based samplers fail to scale or collapse to dominant modes at low temperature.
The energy evaluation budget required by PITA is drastically lower than for direct diffusion or MD-based sampling; buffer reuse and staged learning amortize the cost across intermediate models.
Practical Implications
PITA's framework has direct practical utility in fields requiring principled equilibrium sampling where one cannot rely on long, well-mixed MD chains due to computational cost or limited exploration at low temperature—for example:
- Protein folding and conformational landscape exploration.
- Materials design via atomistic simulation.
- Bayesian inference in physical models with intractable likelihoods.
Beyond domain-specific use, PITA demonstrates a general approach for leveraging learned probabilistic models where direct data access is only feasible for "easier" (smoothed/annealed) distributions, introducing an efficient scheme for amortized importance path learning.
Key Implementation Considerations:
- Energy-based model training must be carefully regularized and stabilized for the SNIS estimator to remain effective and avoid degeneracy.
- Choice of temperature steps is critical—too aggressive annealing leads to low effective sample size; PITA’s sequential bridging mitigates this.
- Architecture and preconditioning (e.g., using DiT backbones, parameterized with temperature) are crucial for molecular data, as seen in the strong empirical performance.
- Computational requirements are dominated by energy evaluations during SMC and initial sampling; subsequent training is relatively cheap, which is advantageous for repeated inference/evaluation.
Limitations and Future Directions
The main limitations stem from the necessity of robust EBM training and careful temperature schedule design. Additionally, simultaneous optimization of score and energy models increases computational and memory overhead. Estimating optimal temperature scheduling or adaptive annealing strategies represents a promising avenue for further progress.
Looking ahead, PITA opens promising directions for scalable, black-box equilibrium sampling in high-dimensional scientific domains, with potential for adaption to broader classes of distributions and integration with other amortized inference techniques.