Variance-Adaptive Doob Martingale

Updated 20 November 2025

Variance-adaptive Doob martingales are stochastic processes that adjust concentration inequalities based on the realized conditional variance rather than fixed bounds.
They power adaptive algorithms in optimal stopping, sequential analysis, and off-policy learning by using variance-sensitive corrections and randomized minimization.
Their framework enables robust, low-variance estimators with self-normalized maximal inequalities, yielding tight confidence bounds and zero-variance properties in optimized settings.

A variance-adaptive Doob martingale is a martingale process whose deviation and concentration properties, as well as its performance in learning and optimization tasks, adapt to the realized conditional variance rather than a worst-case or fixed variance bound. In optimal stopping, statistical learning, and sequential analysis, this concept underpins tight maximal inequalities, adaptive confidence bounds, and dual algorithms featuring robust, low-variance estimators. The framework leverages the Doob decomposition, variance-sensitive concentration (often with iterated-logarithm corrections), and variance-driven selection among candidate martingales or estimators.

1. Foundational Definitions and Classical Construction

Let $(X_t)_{t=0,\dots,T}$ be an adapted process on a filtered probability space $(\Omega, \mathcal{F}, (\mathcal{F}_t), \mathbb{P})$ . The Snell envelope $Y^*_t$ is the minimal supermartingale dominating $X$ :

$Y^*_t = \esssup_{\tau \ge t} \mathbb{E}[X_\tau \mid \mathcal{F}_t].$

The Doob-Meyer decomposition gives $Y^*_t = Y^*_0 + M^*_t - A^*_t$ , where $(M^*_t)$ is a martingale (the Doob martingale) and $(A^*_t)$ is a predictable, increasing compensator. $M^*_t$ can be written explicitly as:

$M^*_t = \sum_{s=1}^t \big(Y^*_s - \mathbb{E}[Y^*_s \mid \mathcal{F}_{s-1}]\big).$

This $M^*_t$ is the canonical Doob martingale; it achieves, for the dual of optimal stopping, the tight (pathwise) upper bound on the value.

2. Variance Adaptivity in Martingale Concentration

Variance-adaptive martingale inequalities strengthen classical results by tying deviation bounds to the realized, predictable variance process rather than the time horizon. Let $(X_t)$ be a martingale difference sequence, with $|X_t| \le B$ almost surely and filtration $(\mathcal{F}_t)$ . Define the conditional variance process $V_k = \sum_{t=1}^k \mathbb{E}[X_t^2 \mid \mathcal{F}_{t-1}]$ and the empirical variance $\widehat{V}_k = \sum_{t=1}^k X_t^2 - \frac{1}{k}(S_k)^2$ , where $S_k = \sum_{t=1}^k X_t$ .

The variance-adaptive Doob martingale inequalities, such as those in "PAC-Bayes Iterated Logarithm Bounds for Martingale Mixtures" (Balsubramani, 2015), assert that for a martingale mixture $M_t^\rho$ (an expectation over a posterior $\rho$ on a family $\{M_t(h)\}$ ),

$M_t^\rho \le \lambda_0 V_t^\rho$

for small variances, and---with optimal iterated-logarithm correction---for general $V_t^\rho$ ,

$M_t^\rho \le \sqrt{6(e-2)V_t^\rho \left[2 \ln\ln\left(\frac{3(e-2)V_t^\rho}{M_t^\rho}\right) + \ln\frac{2}{\delta} + \text{KL}(\rho\|\pi)\right]}.$

This results in uniform, PAC-Bayes, time-uniform confidence bounds that shrink adaptively with the observed variance (Balsubramani, 2015).

3. Robustness and Optimality in Dual Formulation for Optimal Stopping

In the dual formulation of optimal stopping, the Rogers–Haugh–Kogan duality is

$Y^*_0 = \sup_{\tau \leq T} \mathbb{E}[X_\tau] = \inf_{M: M_0=0}\mathbb{E}\left[\max_{0 \leq t \leq T}(X_t - M_t)\right].$

A martingale $M$ is called weakly optimal at $t$ if

$Y^*_t = \mathbb{E}\left[\max_{t \leq s \leq T}(X_s - M_s + M_t)\mid\mathcal{F}_t\right],$

and surely optimal if

$Y^*_t = \max_{t \leq s \leq T}(X_s - M_s + M_t) \quad \text{a.s.}$

Within the multiplicity of optimal martingales, only the Doob martingale $M^*$ maintains sure optimality under zero-mean, bounded perturbations $\eta_t$ of the form

$\eta_t = \xi_t(Y^*_t - X_t + A^*_t), \qquad \mathbb{E}[\xi_t]=0,\, \mathbb{P}(\xi_t<1)=1,$

yielding $\widetilde M_t = M^*_t - \eta_t$ (Belomestny et al., 2021). Any other surely optimal $M \ne M^*$ ceases to be optimal under comparable randomization.

4. Randomized Dual Martingale Minimization and Variance Reduction

A randomized algorithm is defined to learn an optimal martingale $M(\alpha)$ within a parametric family by minimizing the dual objective augmented with randomization, using $N$ training paths:

$\widehat{J}_N(\alpha) = \frac{1}{N} \sum_{n=1}^N \max_{0 \le t \le T} \Big\{ X^{(n)}_t - M_t(\alpha; n) + \theta \xi^{(n)}_t(\widehat{Y}^{*(n)}_t - X^{(n)}_t + \widehat{A}^{*(n)}_t) \Big\},$

where $\theta$ tunes the strength of randomization. This objective is piecewise linear and convex in $\alpha$ and is solved as a linear program. The LP structure encourages the solution toward the Doob martingale, achieving a "variance-adaptive" selection: for paths close to optimal exercise, the estimated martingale is automatically driven closer to the true compensator (Belomestny et al., 2021).

The resulting estimator has the following key properties:

Zero-variance property: If $M^*$ is in the model class, the randomized criterion uniquely selects it, yielding zero simulation variance in dual value estimation.
Suboptimality gap: Any alternative $M \ne M^*$ manifests strictly positive variance in the dual estimator, as the pathwise dual estimator cannot be fit for all samples post-randomization.
Convergence: As $N \to \infty$ , the solution $\hat\alpha$ converges to the true $\alpha^*$ , with the estimated dual bound converging to $Y^*_0$ (Belomestny et al., 2021).

5. Self-Normalized Maximal Inequalities and Empirical Variance Adaptivity

Variance-adaptive maximal inequalities are further refined in the self-normalized regime. For real-valued martingale differences $(X_t)$ with $|X_t| \le B$ , the self-normalized maximal inequality asserts that, for each $\delta\in(0,1)$ and sequential entropy exponent $p$ ,

$\overline{S}_n = \frac{1}{n}\sum_{t=1}^n X_t \le C_1 \frac{\sqrt{\log(\log(n^{1/(2+p)}B^2)/\delta)}}{\sqrt{n}\,\hat{\sigma}_n^{1-p/2}} + C_2 \frac{\hat{\sigma}_n^{-p}}{n} + C_3 \frac{\log(\log(n^{1/(2+p)}B^2)/\delta)}{n^{2/(2+p)}}$

with $\hat{\sigma}_n = \sqrt{\widehat{V}_n}$ , where constants $C_1, C_2, C_3$ are universal (Girard et al., 17 Oct 2025). These bounds are uniform across the function/policy class and stopping times, and shrink adaptively with the realized sample variance.

6. Applications in Learning, Policy Evaluation, and Sequential Analysis

Variance-adaptive Doob-type martingales and associated inequalities are foundational for several domains:

Optimal stopping and American option pricing: Dual formulations with variance-adaptive martingales yield sharp, provable, and robust upper bounds for the stopping value, leveraging randomized minimization for tight confidence (Belomestny et al., 2021).
PAC-Bayes learning and statistical risk bounds: Martingale mixture concentration transforms the fixed-time Hoeffding-Azuma bounds into iterated-logarithm, variance-adaptive forms, enabling posterior- and data-driven generalization guarantees (Balsubramani, 2015).
Off-policy learning: In adaptive data settings (bandits, reinforcement learning), variance-regularized algorithms use the self-normalized inequality to construct empirical risk penalties or confidence intervals which contract when the conditional variance is low, improving over worst-case rates (Girard et al., 17 Oct 2025).
Empirical process theory: Sequential chaining and self-normalized deviation bounds generalize to infinite policy or function classes, with complexity controlled by entropy exponents.

7. Practical Implementation and Computational Considerations

The randomized dual martingale minimization can be reduced to a linear program, scaling linearly in the number of simulated paths and basis functions. Sparsity in the underlying function family or basis can be exploited for computational efficiency.
In high-dimensional or infinite family settings, one utilizes linear combinations of basis martingales or Hermite chaos expansions and regularizes to prevent overfitting.
The randomization scaling parameter $\theta$ is selected to balance convexification and variance—typically via cross-validation (Belomestny et al., 2021).
In all scenarios, variance adaptivity ensures that paths with low conditional variance contribute low uncertainty, leading to estimators whose excess risk or dual-variance concentrates sharply and adaptively with the sample.

Summary Table: Key Features of the Variance-Adaptive Doob Martingale Framework

Property	Classical Doob/Azuma	Variance-Adaptive Doob Martingale	Reference
Depends on realized variance?	No	Yes ( $V_t$ , $\widehat{V}_n$ )	(Balsubramani, 2015, Girard et al., 17 Oct 2025)
Uniform over time/posteriors?	No	Yes, in $t$ and PAC-Bayes posterior $\rho$	(Balsubramani, 2015)
Robust to randomization?	No (in general)	Yes, uniquely for $M^*$ among optimal $M$	(Belomestny et al., 2021)
Attains zero-variance estimator?	No	Yes, if $M^*$ parametrized precisely	(Belomestny et al., 2021)

The introduction of variance-adaptive Doob martingales, self-normalized maximal inequalities, and randomized dual minimization algorithms constitutes a rigorous foundation for low-variance, robust, and adaptive inference in martingale-centric applications across optimal stopping, sequential decision-making, and adaptive learning (Belomestny et al., 2021, Balsubramani, 2015, Girard et al., 17 Oct 2025).