Deep Unsupervised Learning via Nonequilibrium Thermodynamics

Updated 18 December 2025

Deep unsupervised learning using nonequilibrium thermodynamics is a framework that integrates statistical mechanics concepts like fluctuation theorems and entropy production into model design.
It employs techniques such as Markov chain dynamics, variational free-energy methods, and diffusion processes to improve learning efficiency and generative fidelity.
Implementations include modified RBMs, diffusion probabilistic models, and variational autoencoders, highlighting dynamic parameter updates and irreversible work minimization.

Deep unsupervised learning using nonequilibrium thermodynamics encompasses a set of frameworks and methodologies in which the principles and mathematical formalism of nonequilibrium statistical mechanics—especially fluctuating dynamics, entropy production, and work protocols—are harnessed both to design new generative models and to interpret learning algorithms for high-dimensional, unlabeled data. This interdisciplinary approach leverages Markovian and diffusion processes, variational free-energy principles, generalized Onsager structures, and fluctuation theorems to describe and enhance the behavior of neural parameter updates, model architectures, and representation learning far from equilibrium.

1. Nonequilibrium Thermodynamic Foundations of RBM-based Deep Learning

A Restricted Boltzmann Machine (RBM), an essential component in early deep unsupervised learning architectures, provides a canonical example of a generative model with a direct mapping to equilibrium thermodynamics. An RBM consists of binary visible units $v$ and binary hidden units $h$ with a bipartite “energy” function: $E(v,h;\lambda) = -\sum_{i} a_i v_i - \sum_{j} b_j h_j - \sum_{i,j} v_i w_{ij} h_j,$ with parameters $\lambda = \{a_i, b_j, w_{ij}\}$ . At equilibrium, the (joint) Boltzmann probability is

$p_{\lambda}(v,h) = \frac{1}{Z(\beta, \lambda)} \exp[-\beta E(v,h;\lambda)]$

where $Z$ is the partition function. The RBM updates correspond to alternating Gibbs sampling steps, and the discrete-time Markov chain satisfies detailed balance, allowing an exact thermodynamic description (Salazar, 2017).

Deep unsupervised learning on such models can be interpreted as a nonequilibrium process wherein the model parameters $\lambda$ are iteratively updated via protocols that drive the system through a space of distributions, analogous to work being performed on a thermodynamic system. The first law for any trajectory $(s_0 \to \dots \to s_K)$ with parameters $(\lambda_0 \to \dots \to \lambda_K)$ reads

$\Delta E = W + Q; \quad W = \sum_{k=0}^{K-1}\big[E(s_k, \lambda_{k+1}) - E(s_k, \lambda_k)\big]; \quad Q = \sum_{k=0}^{K-1}\big[E(s_{k+1}, \lambda_{k+1}) - E(s_k, \lambda_{k+1})\big].$

Fluctuation theorems (Crooks, Jarzynski) connect the distribution of work over learning protocols to free-energy differences and underpin the statistical mechanics of CD learning, annealed importance sampling, and energy-based training.

2. Violation of Detailed Balance and Nonequilibrium Latent Cycles

Recent developments break away from the equilibrium RBM paradigm by introducing independent parameterizations for the conditional transitions $P(h|v)$ and $P(v|h)$ . This leads to Markov chains with nonvanishing probability currents and persistent entropy production, as the steady state is no longer described by a symmetric (detailed balance) energy function (Baiesi et al., 12 Dec 2025). The model

$P(h|v; \theta) = \_; \quad P(v|h; \theta) = A,$

with $W^+ \neq (W^-)^T$ , produces nonequilibrium steady states characterized by

$J_{ij} = \pi_i P_{ij} - \pi_j P_{ji}, \qquad \sigma = \sum_{i,j} J_{ij} \ln \frac{P_{ij}}{P_{ji}},$

where $\sigma$ is the entropy production.

Nonequilibrium learning protocols drive the system into regions of parameter space where self-transition probabilities in the latent codes decrease, probability currents and persistent cycles in the hidden state emerge, and generative fidelity improves. Training dynamically steers the model away from reversible (equilibrium) points; finite entropy production is a necessary signature of effective model learning and generative power, a feature absent in equilibrium RBMs (Baiesi et al., 12 Dec 2025).

3. Diffusion Probabilistic Models: Deep Learning as Reverse Nonequilibrium Transformation

Diffusion probabilistic models generalize the nonequilibrium view by considering a forward Markov process that gradually destroys data structure through a sequence of small stochastic steps, driving the input distribution towards a tractable, unstructured prior. The reverse process, parameterized by a deep neural network, learns to undo this diffusion and reconstruct data (Sohl-dickstein et al., 2015). The forward kernel for continuous data is

$q(x^{(t)}|x^{(t-1)}) = \mathcal{N}(x^{(t)}; \alpha_t x^{(t-1)}, \sigma_t^2 I),$

with $x^{(0)} \sim q_{\text{data}}$ and $x^{(T)} \sim \pi(x)$ , a simple prior.

The generative process is the learned time-reversal: $p_\theta(x^{(t-1)}|x^{(t)}) = \mathcal{N}(x^{(t-1)}; \mu_\theta(x^{(t)}, t), \Sigma_\theta(x^{(t)}, t)).$ A variational lower bound (ELBO) is derived by treating the forward noising process as an approximate posterior, and training proceeds by minimizing the sum of stepwise forward–backward KL divergences. The framework admits a nonequilibrium thermodynamic interpretation: the forward process mimics quasi-static “heating,” and backward KL terms correspond to local entropy production. In the small-step limit ( $\beta_t \rightarrow 0$ ), the process approaches reversibility and the variational bound tightens.

Diffusion models achieve both tractability and statistical power, supporting rapid sampling, explicit likelihoods, and robust conditional inference. Empirical results validate the framework across synthetic and complex real-world datasets (Sohl-dickstein et al., 2015).

4. Onsager Principle and Learning Nonequilibrium Dynamics

The Onsager principle provides a variational foundation for the evolution equations of dissipative, possibly nonlinear, stochastic systems. In the generalized stochastic Onsager formalism, a reduced variable $Z\in \mathbb{R}^d$ evolves as

$dZ = -[M(Z) + W(Z)]\nabla V(Z)\,dt + \sigma(Z)\,dB(t)$

where $M$ is symmetric positive semi-definite (dissipative), $W$ is antisymmetric (Hamiltonian), $V$ is the free energy, and $\sigma$ is the noise amplitude (Chen et al., 2023). Data-driven methods can jointly learn:

The map from microscopic trajectory observations to reduced coordinates (“thermodynamic closure”)
Parameterizations for $M$ , $W$ , $V$ , and $\sigma$ using neural architectures

Training uses maximum likelihood over the Euler–Maruyama discretization of trajectory pairs, augmented by a reconstruction loss via a decoder. The learned system reliably recovers macroscopic landscapes such as energy barriers and transition states, as demonstrated on polymer stretching and epidemic spread models. This approach yields interpretable, structured dynamics and control strategies that respect nonequilibrium constraints.

Variational Onsager neural networks (VONNs) extend this to the PDE setting, learning unknown free-energy and dissipation potentials from field and process measurements while enforcing convexity and compliance with the second law (Huang et al., 2021). Inputs are passed through integrable and input-convex architectures with losses based on PDE residuals and adaptive weighting.

5. Nonequilibrium Representation Learning and Many-Body Systems

Theoretical equivalence can be drawn between representation learning in deep bottleneck architectures—especially variational autoencoders (VAEs)—and nonequilibrium thermodynamic processes. The VAE evidence lower bound

$\mathcal{L}(\theta,\phi; x) = \mathbb{E}_{q_\phi(z|x)}[\log p_\theta(x|z)] - D_{\mathrm{KL}}(q_\phi(z|x) \| p(z))$

maps onto a nonequilibrium variational free energy,

$F_{\rm var} = \mathbb{E}_{q_\phi}[H(x|z)] + D_{\mathrm{KL}}(q_\phi(z|x) \| p(z)),$

where the KL divergence plays the role of irreversible entropy production (Zhong et al., 2020). This mapping enables new metrics for quantifying classification, memory, discrimination, and novelty in many-body systems driven far from equilibrium; VAEs reveal self-organization with higher sensitivity than macroscopic thermodynamic observables such as work absorption.

6. Operational and Algorithmic Consequences

A key principle across these nonequilibrium thermodynamic frameworks is that unsupervised training, including Contrastive Divergence (CD) in RBMs and entropy-minimization in diffusion and Onsager models, equates to minimizing the irreversible work injected by parameter updates. In CD, for $k$ steps,

$CD_k = [S_k - S_D] - \beta \langle Q_k \rangle$

measures the entropy gain minus absorbed heat, quantifying irreversibility (Salazar, 2017). Annealed importance sampling and generative diffusion leverage exponential averages of the work (via Jarzynski) for explicitly estimating normalization constants and model likelihoods.

Entropy production emerges as a tunable resource: in nonequilibrium models, increasing irreversibility (finite entropy production) correlates with improved generative performance, faster mixing, and more robust exploration of the data manifold (Baiesi et al., 12 Dec 2025). Algorithmic analogies to minimum-dissipation thermodynamic protocols suggest the utility of adaptive learning rates, counterdiabatic driving, and control-inspired update schemes that mirror efficient non-equilibrium transformations.

7. Implications and Scope

The intersection of nonequilibrium thermodynamics and deep unsupervised learning provides both a quantitative interpretative apparatus and a principled design space for generative models. This framework unifies previously disparate approaches—energy-based models, diffusion models, variational frameworks, and neural PDE discovery—while supplying operational understanding of training efficiency, irreversibility, and self-organization. Nonequilibrium tools, such as entropy production, probability currents, and fluctuation theorems, become not only diagnostic metrics but design objectives.

Empirical evidence, across modalities—from spin glasses to polymers, epidemic models, and visual data—demonstrates the capacity of nonequilibrium formalism to both diagnose and enhance learning and generative performance. Future research directions include developing optimal dissipation-minimizing protocols, scalable inference in large-scale stochastic dynamical systems, and hybrid frameworks that fuse nonequilibrium physics with advanced deep architectures for both scientific modeling and complex data-driven generative tasks (Salazar, 2017, Sohl-dickstein et al., 2015, Baiesi et al., 12 Dec 2025, Chen et al., 2023, Huang et al., 2021, Zhong et al., 2020).