From discrete-time policies to continuous-time diffusion samplers: Asymptotic equivalences and faster training (2501.06148v1)

Published 10 Jan 2025 in cs.LG and stat.ML

Abstract: We study the problem of training neural stochastic differential equations, or diffusion models, to sample from a Boltzmann distribution without access to target samples. Existing methods for training such models enforce time-reversal of the generative and noising processes, using either differentiable simulation or off-policy reinforcement learning (RL). We prove equivalences between families of objectives in the limit of infinitesimal discretization steps, linking entropic RL methods (GFlowNets) with continuous-time objects (partial differential equations and path space measures). We further show that an appropriate choice of coarse time discretization during training allows greatly improved sample efficiency and the use of time-local objectives, achieving competitive performance on standard sampling benchmarks with reduced computational cost.

Summary

The paper establishes a theoretical equivalence between discrete-time reinforcement learning objectives and continuous-time diffusion stochastic differential equations in the limit.
Utilizing coarse time discretization during training leads to improved sample efficiency and reduced computational cost while maintaining performance.
Empirical validation on standard sampling benchmarks demonstrates the method achieves competitive performance with significantly reduced computational resources.

The paper "From discrete-time policies to continuous-time diffusion samplers: Asymptotic equivalences and faster training" explores the equivalence between discrete-time and continuous-time models used in training diffusion-based neural stochastic differential equations (SDEs). The authors address the problem of sampling from a Boltzmann distribution using stochastic processes when direct samples from the target distribution are unavailable, a common scenario in physical sciences and Bayesian statistics.

Key Contributions

Theoretical Equivalence: The paper establishes the equivalence between discrete-time reinforcement learning (RL) objectives and continuous-time partial differential equations (PDEs) in the limit of infinitesimal time discretization steps. This is significant as it bridges discrete-time entropic RL methods, such as Generative Flow Networks (GFlowNets), with continuous-time dynamics described by PDEs and path space measures.
Improved Sample Efficiency: The theoretical findings suggest that employing coarse time discretization during the training phase can lead to improved sample efficiency. The approach reduces computational cost while maintaining competitive performance by enabling the use of time-local objectives.
Empirical Validation: The authors validate the theoretical claims through experiments on standard sampling benchmarks. The results show that the proposed method achieves similar performance to state-of-the-art methods but with reduced computational resources.

Methodology

Discrete-Time Policies: The paper models discrete-time generative processes as Markov chains with transition kernels. These processes are trained to approximate a target distribution by minimizing divergences between the distributions induced by the generative and reverse processes.
Continuous-Time Processes: In continuous-time, the generative model is specified by an SDE. The drift function of the SDE is modeled using neural networks, and the training aims to match the distribution induced by the SDE with the target distribution.
Convergence: The authors prove that discrete-time global objectives converge to continuous-time trajectory measures and that local constraints enforced by GFlowNets approach the Fokker-Planck equation governing the dynamics of continuous-time marginals.

Results

Global Objectives: The paper demonstrates that trajectory-based objectives (e.g., trajectory balance and log-variance) formulated in discrete-time converge to their continuous-time counterparts.
Local Objectives: For local objectives, detailed balance divergences in discrete-time asymptotically satisfy continuous-time PDE constraints, specifically Nelson's identity and the Fokker-Planck equation.
Training Efficiency: Experiments showed that training with coarse nonuniform time discretizations allows models to maintain performance while reducing training costs, crucial for high-dimensional problems where long trajectories are computationally prohibitive.

Applications

This work has potential applications in fields requiring efficient sampling from complex distributions, such as statistical physics, Bayesian inference, and machine learning, particularly when training high-dimensional generative models. The proposed framework can lead to more efficient training strategies by leveraging the continuous-time interpretation of stochastic dynamics and optimizing training objectives accordingly.

PDF Markdown

Related Papers

Tweets

https://twitter.com/lorenz_richter/status/1879197523111469560

https://twitter.com/FelineAutomaton/status/1878789016251961779

https://twitter.com/lorenz_richter/status/1879196832833867823