Donsker–Varadhan Variational Formula

Updated 9 June 2026

Donsker–Varadhan variational formula is a core result that provides a dual representation of KL divergence, unifying concepts in probability, large deviations, and spectral theory.
The formula extends to applications in statistical mechanics, risk-sensitive control, and modern machine learning, with concrete examples in reinforcement learning and density estimation.
Recent advances integrate the DV framework into neural network divergence estimation and robust variational inference, highlighting its utility in high-dimensional and complex stochastic systems.

The Donsker–Varadhan variational formula is a foundational result in probability theory, statistical mechanics, large deviations, spectral theory, risk-sensitive control, and modern machine learning. It provides a dual representation of quantities such as the Kullback–Leibler divergence, principal eigenvalues of positive operators, and large deviation rate functions for empirical measures of Markov processes. The formula and its extensions unify concepts from convex analysis, information theory, stochastic control, and statistical inference, with modern applications ranging from reinforcement learning policy optimization to neural network-based divergence estimation.

1. Classical Variational Principle

The classical Donsker–Varadhan (DV) variational formula gives a dual representation of the Kullback–Leibler divergence (relative entropy) between two probability measures. For probability measures $Q$ and $P$ on a measurable space $(\Omega, \mathcal{F})$ with $Q \ll P$ , the DV formula is: $D_\mathrm{KL}(Q\|P) = \sup_{f\in\mathcal{M}_b(\Omega)} \left\{ \mathbb{E}_Q[f] - \log \mathbb{E}_P[e^{f}] \right\}.$ The supremum is attained at $f^*(x) = \log\frac{dQ}{dP}(x)$ . This representation applies to general measurable spaces, with the admissible test functions extendable to bounded continuous or Lipschitz functions where appropriate. For Markov processes, the associated rate function for empirical measures admits an equivalent formula using the generator $L$ : $I(\mu) = -\inf_{u>0\in\operatorname{Dom}(L)} \int \frac{L u}{u}\, d\mu,$ where the infimum is over strictly positive functions in the domain of $L$ (Birrell et al., 2020, Dupuis et al., 2013).

2. Extensions: Rényi and Generalized Divergences

The DV formula generalizes to the Rényi α-divergences. For $\alpha\neq 0,1$ , the order- $P$ 0 Rényi divergence $P$ 1 admits the following variational form: $P$ 2 for any function class $P$ 3 containing the bounded measurable functions. At optimality, $P$ 4 in the absolutely continuous case, and in the limit $P$ 5, the formula recovers the classical DV variational representation for Kullback–Leibler divergence (Birrell et al., 2020).

A maxitive analogue of the Donsker–Varadhan formula exists in possibility theory, substituting integrals with suprema and KL divergence with the max-relative entropy, formalizing a max-plus (tropical) duality essential for possibilistic variational inference (Singh et al., 26 Nov 2025).

3. Spectral and Large Deviations Theory

The DV formula is the cornerstone for analyzing large deviations of empirical measures of Markov processes:

Diffusions: For a reversible diffusion $P$ 6 with infinitesimal generator

$P$ 7

the rate function for occupation-time large deviations is

$P$ 8

This formula connects to the Freidlin–Wentzell action in small noise limits (Bertini et al., 2022).

Jump Markov processes: For pure jump Markov processes on a compact Polish space with generator

$P$ 9

and reversible invariant measure $(\Omega, \mathcal{F})$ 0, the explicit DV rate function is:

$(\Omega, \mathcal{F})$ 1

equivalently expressed via the Dirichlet form $(\Omega, \mathcal{F})$ 2 for $(\Omega, \mathcal{F})$ 3 (Dupuis et al., 2013).

Spectral theory: The DV formula arises in characterizing principal eigenvalues of positive (linear or nonlinear) operators via

$(\Omega, \mathcal{F})$ 4

crucial for risk-sensitive control and spectral theory of elliptic operators (Arapostathis et al., 2013, Anantharam et al., 2015, Arapostathis et al., 2019).

4. Applications in Modern Machine Learning

The Donsker–Varadhan variational principle underlies numerous algorithms and estimator designs in contemporary machine learning:

Reinforcement learning: The DV formula expresses the equivalence between entropy-regularized policy gradients and soft Q-learning, with the optimal policy taking the Gibbs (Boltzmann) form:

$(\Omega, \mathcal{F})$ 5

The Bellman operator incorporates a log-sum-exp (softmax) and the Legendre transform with respect to entropy regularization (Richemond et al., 2017).

Density estimation by deep networks: Plugging deep neural networks into the DV representation enables principled estimation of data densities. For example, with $(\Omega, \mathcal{F})$ 6 empirical and $(\Omega, \mathcal{F})$ 7 uniform, training $(\Omega, \mathcal{F})$ 8 to maximize

$(\Omega, \mathcal{F})$ 9

produces $Q \ll P$ 0 approximating $Q \ll P$ 1-density up to an additive constant. This approach is competitive for high-dimensional density estimation, outperforms kernel density estimation, and improves downstream tasks such as anomaly detection and classification (Park et al., 2021).

Variational estimation of Rényi divergences: The Rényi-DV formula provides the foundation for statistically consistent neural network estimators of $Q \ll P$ 2. Universal approximation of the test function by neural nets is sufficient for consistency, and rates scale efficiently with sample size even in thousands of dimensions (Birrell et al., 2020).
Possibilistic inference: Replacement of integrals by max operations and KL by max-entropy allows application in imprecise probability and robust inference (Singh et al., 26 Nov 2025).

5. Risk-Sensitive Control, Dynamic Programming, and Collatz–Wielandt Duality

The variational structure of the DV formula generalizes to the principal eigenvalue of controlled positive operators, yielding risk-sensitive reward rates: $Q \ll P$ 3 where $Q \ll P$ 4 runs over ergodic occupation measures induced by admissible policies $Q \ll P$ 5, and $Q \ll P$ 6 is the Kullback–Leibler divergence; $Q \ll P$ 7 denotes per-stage reward (Arapostathis et al., 2019, Anantharam et al., 2015).

The abstract Collatz–Wielandt duality, extended to nonlinear and controlled contexts, provides the min-max structure underlying the DV formula and its generalizations (Arapostathis et al., 2013).

6. Further Structural and Computational Developments

The DV variational principle also:

Underlies large deviation analysis for Markov processes with degenerate rates (e.g., absorbing states) and self-interacting pure jump processes, yielding explicit and non-convex large deviation rate functions in the latter case (Basile et al., 2013, Budhiraja et al., 16 Oct 2025).
Admits max-plus (tropical) analogues for possibility theory, crucial for robust variational inference frameworks (Singh et al., 26 Nov 2025).
Provides lower bounds for principal eigenvalues of elliptic operators in terms of exit times, offering both classical moment-based and quantile-based (probabilistic) bounds with implications for metastability and Monte Carlo spectral estimation (Lu et al., 2016).
Facilitates explicit computation in Piecewise Deterministic Markov Processes (PDMP) such as the zig-zag sampler, where the DV rate function quantifies empirical measure convergence and elucidates the effect of refreshment rates (Bierkens et al., 2019).

7. Summary Table: Donsker–Varadhan Formula—Contexts and Variational Forms

Context	Variational Form	Reference
KL divergence ( $Q \ll P$ 8)	$Q \ll P$ 9	(Birrell et al., 2020, Park et al., 2021)
Rényi divergence	$D_\mathrm{KL}(Q\\|P) = \sup_{f\in\mathcal{M}_b(\Omega)} \left\{ \mathbb{E}_Q[f] - \log \mathbb{E}_P[e^{f}] \right\}.$ 0	(Birrell et al., 2020)
Empirical measure LDP (diffusion)	$D_\mathrm{KL}(Q\\|P) = \sup_{f\in\mathcal{M}_b(\Omega)} \left\{ \mathbb{E}_Q[f] - \log \mathbb{E}_P[e^{f}] \right\}.$ 1	(Dupuis et al., 2013, Bertini et al., 2022)
Controlled Markov chain (reward rate)	$D_\mathrm{KL}(Q\\|P) = \sup_{f\in\mathcal{M}_b(\Omega)} \left\{ \mathbb{E}_Q[f] - \log \mathbb{E}_P[e^{f}] \right\}.$ 2	(Arapostathis et al., 2019, Anantharam et al., 2015)
Possibility theory (maxitive)	$D_\mathrm{KL}(Q\\|P) = \sup_{f\in\mathcal{M}_b(\Omega)} \left\{ \mathbb{E}_Q[f] - \log \mathbb{E}_P[e^{f}] \right\}.$ 3	(Singh et al., 26 Nov 2025)
Zig-zag process PDMP	$D_\mathrm{KL}(Q\\|P) = \sup_{f\in\mathcal{M}_b(\Omega)} \left\{ \mathbb{E}_Q[f] - \log \mathbb{E}_P[e^{f}] \right\}.$ 4, explicit minimization in $D_\mathrm{KL}(Q\\|P) = \sup_{f\in\mathcal{M}_b(\Omega)} \left\{ \mathbb{E}_Q[f] - \log \mathbb{E}_P[e^{f}] \right\}.$ 5	(Bierkens et al., 2019)

The DV variational framework thus serves as a unifying principle across stochastic processes, divergence estimation, spectral theory, control, and contemporary machine learning, underpinning both theoretical advances and practical algorithmic developments.