Moderate Deviation Principles (MDP)

Updated 3 April 2026

Moderate Deviation Principles (MDP) quantify asymptotic probabilities for fluctuations between central limit and large deviation regimes.
The methodology employs precise scaling, variational formulations, and quadratic rate functions specific to diverse stochastic models.
Applications span interacting diffusions, exclusion processes, and recursive algorithms, informing rare-event simulation and statistical estimation.

A moderate deviation principle (MDP) describes the asymptotic probabilities of stochastic fluctuations that are larger than those governed by central limit theorems (CLT) but smaller than those covered by large deviation principles (LDP). Formally, an MDP bridges the gap between the Gaussian scaling—where fluctuations are typically of size $O(n^{-1/2})$ —and the exponentially unlikely events on the $O(1)$ scale of a full LDP, quantifying the decay rates for “moderately rare” events that vanish more slowly than those dictated by the CLT but still much more rapidly than in the law of large numbers regime. The mathematical structure of the MDP involves precise scalings, variational formulations, and functional-analytic constraints that depend on the stochastic model class under consideration.

1. Fundamental Definition and Scaling Regimes

Let $\{X_n\}$ be a sequence of random variables or empirical processes satisfying a law of large numbers $X_n \to x_0$ in probability, and a CLT at the scale $b_n$ (typically $b_n = \sqrt{n}$ ), and an LDP at a much larger scale $U_n$ (e.g., $U_n = n$ ). The MDP typically involves a scaling sequence $a_n$ such that $a_n \to \infty$ and $O(1)$ 0 as $O(1)$ 1 (i.e., $O(1)$ 2 grows faster than 1 but much slower than $O(1)$ 3). The moderate deviations regime is governed by the normalized variable $O(1)$ 4, and an MDP states that for Borel sets $O(1)$ 5: $O(1)$ 6 with $O(1)$ 7 a good, quadratic rate function. The rate function and the path space (e.g., $O(1)$ 8, $O(1)$ 9, $\{X_n\}$ 0, or a space of signed measures) are model-specific (Budhiraja et al., 2015, Tsirelson, 2017, Slavík, 2020).

2. Representative Models and Path-Space Frameworks

MDPs have been rigorously established for a wide range of stochastic models:

Weakly Interacting Diffusions and Jump Processes: For empirical measure processes associated with mean-field SDEs (e.g., McKean–Vlasov systems) and pure jump Markov processes, MDPs are formulated for centered, scaled empirical measure-valued processes. Path spaces include $\{X_n\}$ 1, the space of continuous paths with values in the dual of the Schwartz space, and $\{X_n\}$ 2 for countable jump systems (Budhiraja et al., 2015).
Particle Systems, Reaction-Diffusion, and Exclusion Processes: Fluctuation fields in weakly asymmetric exclusion processes (WASEP) and reaction-diffusion models admit MDPs in the topology of distributions or test-function duals, with the rate function determined via variational (Laplace principle) formulations, typically exhibiting quadratic structure constrained by conservation or hydrodynamic equations (Zhao, 2024, Zhao, 2024).
Stochastic Recursive Algorithms: For the continuous-time interpolation of recursive stochastic algorithms, MDPs hold for the scaled deviation processes, often in $\{X_n\}$ 3, with the speed and quadratic rate function determined by local linearization and the noise covariance structure (Dupuis et al., 2014).
Multiscale and Slow–Fast Systems: In two-time-scale diffusions, reaction-diffusion SPDEs, or systems with fast Markovian environments, the MDP describes moderate fluctuations around an averaged (or homogenized) limit, integrating weak convergence techniques, stochastic control, and occupation measures in infinite-dimensional path spaces (Morse et al., 2016, Gasteratos et al., 2020, Qian, 28 Nov 2025).
Random Fields (Hierarchical Approach): For splittable or CMS random fields (i.e., fields admitting suitable “splitting” structure), MDPs are established for integrals over large boxes and for empirical linear statistics against test functions, employing hierarchical induction on dimension and cumulant generating function chaining (Tsirelson, 2017, Tsirelson, 2017, Tsirelson, 2018, Tsirelson, 2019).

3. Variational Rate Functions and Structural Features

MDP rate functions typically involve quadratic cost functionals subject to linear constraints encoding the (possibly infinite-dimensional) linearized dynamics:

Interacting Diffusions: The rate function for empirical measure deviation $\{X_n\}$ 4 is given by

$\{X_n\}$ 5

where $\{X_n\}$ 6 represents the control enforcing $\{X_n\}$ 7’s deviation, linked to solutions of a linearized PDE or SDE along the LLN trajectory $\{X_n\}$ 8 (Budhiraja et al., 2015).

Markov Jump Systems: Similar quadratic forms arise for jump processes, on spaces like $\{X_n\}$ 9.
SPDEs and Infinite-Dimensional Systems: The MDP rate is an action functional over control fields in $X_n \to x_0$ 0, often arising via the solution to an associated “skeleton” (deterministic controlled) equation. For example, in stochastic primitive equations,

$X_n \to x_0$ 1

where $X_n \to x_0$ 2 maps controls to deterministic paths via the skeleton equation (Slavík, 2020).

Martingale and Empirical Process Settings: In models based on recursive algorithms or BMCs, the MDP rate function is given in terms of the solution to a linear ODE with noise covariance appearing in the quadratic cost (Dupuis et al., 2014, Penda et al., 2011).
Random Fields: For splittable random fields, the MDP rate function is universally quadratic, $X_n \to x_0$ 3, under box averaging or test function integration, reflecting CLT-typical Gaussian structure in the moderate regime (Tsirelson, 2017, Tsirelson, 2018).

4. Methodological Framework: Variational Representations and Proof Structure

The dominant proof methodologies for MDPs—especially for infinite-dimensional or path-dependent models—employ variational representations of exponential functionals of stochastic processes:

Budhiraja–Dupuis Weak Convergence Approach: The Laplace principle,

$X_n \to x_0$ 4

is established by characterizing limits of controlled versions of the underlying process and connecting candidate large deviations to optimal controls in a deterministic variational problem (Budhiraja et al., 2015, Qian, 28 Nov 2025, Morse et al., 2016).

Martingale and Exponential Martingale Methods: For particle and interacting system models, exponential martingales and Feynman–Kac formulae are deployed to construct tight upper bounds and to identify rate functions via entropy minimization and duality (Zhao, 2024, Zhao, 2024).
Hierarchical Inductive Control of CGFs: For spatial random fields, a succession of single-coordinate halvings, leak estimates, and cumulant bounds is iterated to establish quadratic control of the generating function, which yields the MDP via a Gärtner–Ellis-type argument (Tsirelson, 2017, Tsirelson, 2017).

5. Canonical Examples and Applications

MDPs have been computed explicitly for a diverse range of systems:

Model Class	Rate Function (examples)	Path Space/Topology
Interacting Diffusions (McKean–Vlasov)	Quadratic action functional over deviations	$X_n \to x_0$ 5
Weakly Asymmetric Exclusion (WASEP)	Supremum of linear functional minus Dirichlet form	$X_n \to x_0$ 6
Join-the-Shortest-Queue-d Systems	Quadratic variational functional on $X_n \to x_0$ 7-paths	$X_n \to x_0$ 8
Bifurcating Markov Chains	$X_n \to x_0$ 9 with $b_n$ 0 dependent on ergodic variance	Real line or finite-dimensional vector
Splittable Random Fields	$b_n$ 1 for box averages or test function integrals	$b_n$ 2, test-function dual
Recursive Stochastic Algorithms	$b_n$ 3	$b_n$ 4

Beyond the abstract theory, MDPs are instrumental in numerical rare-event simulation, statistical estimation theory, and the study of phase transitions in high-dimensional statistical mechanics (Dupuis et al., 2014, Zhao, 2024, Penda et al., 2011).

6. Assumptions, Technical Requirements, and Limitations

The precise form of the MDP and the validity of the quadratic rate function depend on:

Regularity of Dynamics: Smoothness and Lipschitz conditions on drift and diffusion coefficients, global Lipschitzity in both state and measure arguments, and boundedness of derivatives up to suitable order (for interacting diffusions) are typically necessary (Budhiraja et al., 2015).
Ergodicity and Mixing: Rapid mixing / exponential ergodicity for the fast variable (in multiscale systems) and for Markov chains ensures the effective collapse of the occupation measures to those governed by the invariant law (Qian, 28 Nov 2025).
Non-degeneracy and Polynomial Growth: Non-degeneracy of noise is sufficient but not always necessary (e.g., certain degenerate SDEs, provided averaging arguments are available), and growth conditions control the tail behavior for additive functionals (Ren et al., 2021).
Structural Model Constraints: For splittable random fields, the entire construction of the MDP hinges on the ability to recursively split the field along coordinate axes, bounding the remainder "leak" terms and ensuring cumulant generating function control (Tsirelson, 2017, Tsirelson, 2017).
Scaling Constraints: The MDP regime strictly requires $b_n$ 5, $b_n$ 6; the scaling must remain within a zone where quadratic approximations of the log-moment-generating function remain valid, but the Gaussian limit no longer dominates tail probabilities.

7. Broader Context and Significance

The MDP precisely characterizes the regime where fluctuations are too rare for CLT arguments but too frequent for full LDP analysis, providing refined estimates for Gaussian-type tails and the rough shape of moderately large deviations. The quadratic structure of the rate function and the path-level variational principles (or skeleton equations) that arise in MDPs carry direct algorithmic implications (efficient importance sampling, Riccati-based controls), underpin rare-event estimation in complex stochastic systems (Dupuis et al., 2014).

In conclusion, the moderate deviation principle serves as a rigorous bridge between the central limit and large deviations domains, unifying Gaussian fluctuation theory with the calculus of rare macroscopically significant events, across an array of stochastic models from particle systems to random fields, interacting diffusions, and recursive algorithms (Budhiraja et al., 2015, Tsirelson, 2017, Tsirelson, 2016, Dupuis et al., 2014, Zhao, 2024).