Donsker–Varadhan Lower Bound in Large Deviations

Updated 18 April 2026

Donsker–Varadhan lower bound is a variational principle that characterizes large-deviation rate functions in stochastic systems using Legendre–Fenchel transforms.
It employs saddle-point formulations, exponential martingales, and entropy-based arguments to derive sharp eigenvalue bounds even under nonuniform ergodicity.
The formulation underpins practical methods in density estimation, mutual information estimation, and statistical testing across diverse stochastic models.

The Donsker–Varadhan (DV) lower bound represents a foundational variational principle in large deviation theory, quantifying the exponential rate at which probabilities of atypical empirical measures decay in Markov processes and related stochastic systems. The central insight of Donsker and Varadhan was that such rare-event probabilities can be characterized in terms of saddle-point or variational functionals involving expectations over test functions or auxiliary tilting measures. The DV lower bound provides both exact rate functions in a variety of settings, including Markov jump processes, diffusions, and even high-dimensional control and statistical learning contexts, and is structurally robust to lack of uniform ergodicity and the presence of absorbing states.

1. Mathematical Formulation and Variational Principles

The archetypal DV lower bound gives a variational representation for the large-deviation rate function of the empirical measure of a Markov process. For a continuous-time Markov chain $\{X_t:t\ge0\}$ on a compact metric space $E$ with transition rates $c(x,dy)=r(x)p(x,dy)$ (where $r:E \to [0,\infty)$ , $p$ a Feller kernel), the empirical measure is defined by

$\mu_T(f) = \frac{1}{T} \int_0^T f(X_t)\,dt\,, \qquad f\in C(E)\ .$

The DV rate function for $\mu_T$ is

$I(\mu) = \sup_{f \in C(E)} \left\{ \langle \mu, f \rangle - \Lambda(f) \right\}\ ,$

where $\Lambda(f)$ is the logarithmic moment generating function (principal eigenvalue) of the perturbed generator: $\Lambda(f) = \lim_{T \to \infty} \frac{1}{T} \log \mathbb{E}_x \exp \left\{ \int_0^T f(X_t)\,dt \right\}\,.$ This variational character holds even in the presence of absorbing states under integrability and compactness assumptions (Basile et al., 2013).

For diffusion processes or jump Markov chains, the DV functional appears as the Legendre–Fenchel transform of the log-principal-eigenvalue of the tilted generator: $E$ 0 Equivalent forms, such as representing $E$ 1 by contraction over flows or expressing it directly via entropic integrals of empirical flows, are also established (Basile et al., 2013).

2. Proof Structure and Martingale/Entropy Arguments

The standard proof of the DV lower bound has two parts:

Upper Bound: One constructs an exponential martingale (the "DV martingale") using the tilting function. For continuous or discrete chains on Polish spaces,

$E$ 2

is a martingale, leading to non-asymptotic exponential tail bounds for the empirical measure (Cerf, 2022). The Laplace–Varadhan method then yields an asymptotic upper bound with rate function $E$ 3.

Lower Bound: An explicit entropy minimization (tilting) construction generates a new Markov law under which the empirical process converges to the desired target, and the relative entropy cost converges to $E$ 4. Density and approximation arguments extend this to all relevant empirical measures (Basile et al., 2013). The lower bound thus follows from a "change-of-measure/relative entropy" argument, ensuring the variational formula is sharp.

This approach is robust and extends to the empirical joint distribution of measure and flow, to path-space (level-3) LDPs, and even to the setting of infinite-dimensional Markov semigroups (Zhao, 17 Jun 2025, Bertini et al., 2021).

3. Scope, Assumptions, and Extensions

Generic Requirements

The Donsker–Varadhan lower bound holds under broad structural conditions:

Compact metric (Polish) state space or suitable tightness/exponential integrability (Basile et al., 2013, Zhao, 17 Jun 2025)
Transition kernel with continuous, strictly positive density (for jump processes), Feller properties
Existence of an invariant measure (possibly nonunique due to absorbing sets)
Integrability conditions (e.g., $E$ 5 for rates)
Initial state avoidance of absorbing sets

It applies both to time-homogeneous Markov chains (discrete or continuous time), diffusions, jump-processes, and infinite-dimensional semigroups (e.g., white-forced Navier-Stokes (Zhao, 17 Jun 2025)).

Generalizations

Empirical Flow and Joint LDPs: The joint empirical measure and empirical flow (number of jumps between states per unit time) satisfy a joint LDP with an explicit convex rate functional; contraction yields the DV formula for the measure (Basile et al., 2013).
Infinite Dimensions: DV lower bounds with analogous variational structure extend to path-space empirical measures in infinite-dimensional dynamics via spectral and entropy methods (Zhao, 17 Jun 2025).
Composite Hypotheses (Statistical Testing): For testing against composite nulls, a saddle-point DV formula involving infimization over null laws and supremization over test functions characterizes the optimal information-theoretic performance of anytime-valid tests (Shekhar, 23 Dec 2025).
Controlled Markov Processes: The DV formula extends to risk-sensitive and controlled Markov models, yielding Bellman-type equations and linear/convex optimization programs for ergodic occupation measures (Arapostathis et al., 2019).
Systems Without Uniform Ergodicity: The DV formula holds even in degenerate or reducible contexts, with the zero set of the rate function reflecting possible nonuniqueness of invariant measures or the presence of absorbing states (Basile et al., 2013, Eizenberg, 1 Sep 2025).

4. Algorithmic and Applied Implications

The variational structure of the DV lower bound underpins a variety of algorithmic and statistical methods:

Density Estimation via Neural Networks: The DV representation for $E$ 6,

$E$ 7

enables estimation of log-densities (and hence densities) from data using deep networks to parameterize $E$ 8 ("critics") and stochastic optimization to maximize the bound (Park et al., 2021). The supremum is realized at $E$ 9 up to a constant when $c(x,dy)=r(x)p(x,dy)$ 0 is known explicitly (e.g., uniform measure), forming the foundation for likelihood-free density modeling.

Mutual Information Estimation and Representation Learning: In deep contrastive learning (e.g., Deep InfoMax, MINE, RLHF), the DV lower bound on mutual information,

$c(x,dy)=r(x)p(x,dy)$ 1

is optimized using critic networks, positive/negative sample pairs, and minibatch-based stochastic ascent (Lv et al., 27 Jun 2025).

Statistical Testing: The DV characterization of "minimum KL divergence to the null" directly implies information-theoretic lower bounds for testing and informs the construction of optimal e-processes for sequential testing (Shekhar, 23 Dec 2025).
Eigenvalue Bounds: The original DV lower bound gives sharp and stable lower bounds on principal eigenvalues of elliptic operators, including quantile-based refinements, nonlinear variants, and landscape-function perspectives (Mugnolo, 2023, Lu et al., 2016).

5. Functional Forms and Duality Structure

The DV rate function $c(x,dy)=r(x)p(x,dy)$ 2 is always the Legendre–Fenchel transform of the limiting cumulant-generating function,

$c(x,dy)=r(x)p(x,dy)$ 3

where $c(x,dy)=r(x)p(x,dy)$ 4 captures the exponential growth rate of expectations of exponentials of additive functionals. There are several equivalent forms, depending on the model:

For jump Markov processes, an explicit variational representation:

$c(x,dy)=r(x)p(x,dy)$ 5

with $c(x,dy)=r(x)p(x,dy)$ 6 the jump kernel (Basile et al., 2013).

For diffusions with detailed balance (reversible), the DV rate function can be written as a "Fisher information" form:

$c(x,dy)=r(x)p(x,dy)$ 7

with $c(x,dy)=r(x)p(x,dy)$ 8 the stationary density (Hoppenau et al., 2016).

For composite statistical models, the DV minimax formula:

$c(x,dy)=r(x)p(x,dy)$ 9

underpins both lower bounds and algorithmic solution construction via Sion’s minimax theorem (Shekhar, 23 Dec 2025).

All these forms reflect the deep duality between tilted (exponentially changed) measures, principal eigenvalue problems, and large deviation rate functions.

6. Robustness: Degeneracy, Uniqueness, and Dynamical Limitations

The DV lower bound is structurally robust:

Nonuniqueness/Absorbing States: Even when uniform ergodicity fails (e.g., presence of absorbing states), the DV variational formula remains valid, and the zero level set of the rate function reflects all stationary or absorbing measures (Basile et al., 2013).
Infinite-Dimensional Dynamics: The DV rate function applies to stochastic PDEs and infinite-dimensional dynamics when spectral gap, dissipativity, and tightness conditions are established (Zhao, 17 Jun 2025). The proof is abstracted via spectral theory and generalized Feynman–Kac semigroups.
Probabilistic Cellular Automata: For high-dimensional or infinite systems (e.g., PCA on a lattice), the DV action functional remains meaningful, but the set $r:E \to [0,\infty)$ 0 can be extremely restrictive—often only invariant measures have finite action, making lower bounds outside equilibrium trivial (Eizenberg, 1 Sep 2025).

A plausible implication is that for degenerate or high-dimensional Markov processes, the DV lower bound auto-enforces phase-space restrictions: only equilibrium (or tail-invariant) measures are "observable" at the large deviation scale.

7. Comparative Remarks and Practical Impact

The Donsker–Varadhan lower bound unifies various domains:

It supplies both sharp eigenvalue bounds and large deviation rate functions for Markov processes, jump processes, and diffusions.
Its variational structure is foundational in modern density estimation, mutual information estimation, and statistical testing under composite or high-dimensional alternatives.
The duality structure (logarithmic cumulant eigenvalues vs. Legendre transforms) allows its use in controlled Markov processes, risk-sensitive control, and dynamic programming (Arapostathis et al., 2019).

Contemporary research continues to exploit, refine, and generalize the DV framework, notably in statistical learning, deep generative modeling, rare-event simulation, and the analysis of high-dimensional Markov systems.

References:

See (Basile et al., 2013, Hoppenau et al., 2016, Cerf, 2022, Park et al., 2021, Shekhar, 23 Dec 2025, Zhao, 17 Jun 2025, Bertini et al., 2022, Lu et al., 2016, Arapostathis et al., 2019, Bertini et al., 2021, Eizenberg, 1 Sep 2025, Mugnolo, 2023) for full technical details and context.