Almost-Supermartingale Processes

Updated 25 November 2025

Almost-supermartingale processes are recursive stochastic sequences that relax strict martingale conditions to achieve explicit convergence rates.
They provide a unifying framework for iterative schemes like stochastic gradient descent, Oja’s PCA, and the Robbins–Monro algorithm under minimal assumptions.
The methodology leverages normalized supermartingale techniques, auxiliary slowdown functions, and concentration inequalities to secure quantitative, time-uniform convergence guarantees.

Almost-supermartingale processes generalize classical supermartingale sequences, providing a unifying analytical framework for the paper of stochastic iterative algorithms and convergence phenomena encountered in modern probability and optimization theory. These processes are defined by recursive inequalities that relax the strict contraction properties of martingales, enabling sharp quantitative and time-uniform convergence rates with minimal requirements on the underlying structure. Central instances include the Robbins–Siegmund convergence lemma, Dvoretzky’s theorem for noisy Hilbert-space recursions, and stochastic quasi-Fejér monotonicity in metric spaces, with direct implications for stochastic approximation schemes such as stochastic gradient descent, Oja’s PCA algorithm, and the Robbins–Monro procedure (Neri et al., 17 Apr 2025, Pham et al., 23 Nov 2025).

1. Formal Definitions and Relaxed Supermartingale Conditions

The almost-supermartingale condition is formulated as follows: Let $(X_n), (A_n), (C_n)$ be nonnegative, integrable, $\mathcal{F}_n$ –adapted processes on a filtered probability space $(\Omega,\mathcal{F},P)$ . The “relaxed supermartingale” or almost-supermartingale condition is

$E\bigl[X_{n+1}\mid\mathcal{F}_n\bigr] \le (1+A_n) X_n + C_n \quad \text{a.s., for all } n.$

This is complemented by:

Bounded perturbations: $\prod_{i=0}^\infty (1+A_i) < K$ a.s. for some finite $K$ ,
Summable error-terms: There exists $\chi: (0,\infty)\to\mathbb{N}$ such that for all $\varepsilon>0$ ,

$\sum_{i=\chi(\varepsilon)}^\infty E[C_i] < \varepsilon.$

A canonical instance is the process $\{L_t\}_{t\ge 0}$ with noise process $\{U_t\}_{t\ge 1}$ and stepsizes $\{\eta_t\}_{t\ge 1}$ , satisfying for deterministic constants $C_1\in(0,1)$ , $C_2,C_3>0$ , and exponents $a_i,b_i,c_i,d_i>0$ : $L_t \le (1 - C_1 \eta_t) L_{t-1} + U_t,$ with suitably bounded conditional mean and magnitude of $U_t$ (Pham et al., 23 Nov 2025).

2. General Convergence Theorems and Quantitative Rates

Almost-supermartingale recursions admit explicit convergence rates in mean and almost surely via auxiliary “slowdown” functions $f$ which are required to be super-multiplicative, increasing, concave, and continuous (s.i.c.c.). Precisely, if $f$ is s.i.c.c.\ with moduli $\psi, \kappa$ , and $\varphi$ is a $\liminf$ -modulus for $(E[f(X_n)])$ , then:

$E[f(X_n)] \to 0$ at a rate

$\rho(\varepsilon) = \varphi\left(\frac{\varepsilon\,\psi(K^{-1})}{2},\, \chi\left(\kappa\left(\frac{\varepsilon\,\psi(K^{-1})}{2}\right)\right)\right),$

$X_n\to 0$ almost surely with rate

$\rho'(\lambda,\varepsilon) = \rho(\lambda f(\varepsilon)),$

meaning $P(\exists n\ge \rho'(\lambda,\varepsilon): X_n\ge \varepsilon) < \lambda$ (Neri et al., 17 Apr 2025).

The proof strategy involves normalizing the process to a true supermartingale, applying Jensen’s inequality to $f$ , using Ville’s inequality for high-probability bounds, and leveraging the tail-sum bound on $C_n$ . These rates depend only on perturbation and error moduli, not on additional process structure.

3. Key Theoretical Instantiations

Specific instantiations of the almost-supermartingale framework include:

Quantitative Robbins–Siegmund Theorem: Given

$E[X_{n+1}\mid \mathcal F_n] \le (1+a_n) X_n - u_n V_n + C_n$

with $\sum a_n<\infty$ , $\sum C_n<\infty$ , and $\sum u_n=\infty$ (divergence-rate $\theta$ ), explicit convergence rates for $E[f(X_n)] \to 0$ and $X_n\to 0$ a.s. are obtained via explicit functionals of the summability moduli and regularity of the auxiliary process (Neri et al., 17 Apr 2025).

Quantitative Dvoretzky’s Theorem: For Hilbert-space-valued recursions $x_{n+1} = T_{n+1}(x_0,\dots,x_n) + y_n$ with $E[y_n\mid\mathcal{F}_n]=0$ , a.s. convergence and high-probability concentration rates are derived, relying solely on process summability and rate moduli (Neri et al., 17 Apr 2025).
Stochastic quasi-Fejér Monotonicity: For sequences $(x_n)$ in a metric space with quasi-Fejér property

$E[\phi(x_{n+1},z)\mid\mathcal{F}_n] \le (1+\zeta_n) \phi(x_n,z) + \xi_n,$

rates for $E[\phi(x_n, z)]\to 0$ and almost sure convergence are given in terms of rate moduli for $\zeta_n$ and error process $\xi_n$ (Neri et al., 17 Apr 2025).

Robbins–Monro Algorithm: For $x_{n+1} = x_n - a_n y_n$ under moment, monotonicity, and regularity constraints, convergence is established with explicit rates in the strongly monotone and general cases (Neri et al., 17 Apr 2025).

4. Time-Uniform Bounds and Concentration Sequences

A major development is the derivation of time-uniform or any-time high-probability bounds. Under strengthened almost-supermartingale recursions,

$L_t \le (1-C_1\eta_t) L_{t-1} + U_t,$

with noise control and $\min_{i}\{a_i + b_i, c_i + d_i\} > 1$ , one obtains

$\P\left(\forall t\ge 0: L_t \le M\,\frac{\log(1/\delta) + \log\log(t+10)}{t+10}\right) \ge 1-2\delta$

for appropriate $M$ , matching law-of-iterated-logarithm lower bounds (Pham et al., 23 Nov 2025). The proof employs interval stopping, drift-dominated concentration inequalities (Azuma/Freedman type), and stitching arguments.

Compared to exponential supermartingale approaches—where martingale transforms of the form $M_t(\lambda) = \exp[\lambda S_t - \psi(\lambda)V_t]$ are constructed—almost-supermartingale methods bypass the need for tractable exponential martingales and apply directly in settings such as Oja's algorithm or stochastic approximation where classical approaches are not feasible.

5. Applications in Stochastic Approximation and Beyond

Almost-supermartingale frameworks yield comprehensive, quantitative guarantees for a wide array of stochastic iterative algorithms:

Stochastic Gradient Descent (SGD): In the strongly convex case, for SGD recursions, squared-error processes satisfy almost-supermartingale inequalities. The result is

$\P\Bigl(\forall t \ge 1: \|x_t-x^*\|^2 \le O\left(\frac{\log(1/\delta)+\log\log t}{t}\right)\Bigr) \ge 1-\delta$

with explicit prefactors depending on noise and curvature parameters (Pham et al., 23 Nov 2025).

Polyak–Łojasiewicz Processes: For objectives satisfying the PL condition, time-uniform bounds for suboptimality gaps $F(x_t) - F(x^*)$ match the same rate (Pham et al., 23 Nov 2025).
Oja's Streaming PCA: After an initial “warm-up” to ensure $L_t \leq 1/4$ with high probability, the squared-sine angle error sequence for top-eigenvector estimation satisfies the almost-supermartingale property, yielding

$\P\left(\forall t: \sin^2\angle(v_t, v_1) \leq \max\left\{O\left(\frac{L}{\log(1/\delta)}\right), O\left(\frac{\log(1/\delta) + \log\log t}{t}\right)\right\}\right) \geq 1-2(e+1)\delta$

(Pham et al., 23 Nov 2025).

In the Robbins–Monro context, $O(1/n)$ convergence rates are recovered under linear regularity, and $O(1/\sqrt{n})$ rates for subgradient or more general settings. Applications extend to stochastic subgradient methods, proximal-point splitting, metric Fréchet means estimation, and Hadamard-space splitting with minimal additional assumptions (Neri et al., 17 Apr 2025).

6. Role of Moduli and Minimal-Data Dependence

A salient feature of almost-supermartingale convergence rates is their uniformity and mild data dependence: all rates are explicit in terms of

Product bounds on step perturbations ( $K$ ),
Tail-sum moduli for error terms ( $\chi$ ),
Lim-inf or divergence moduli ( $\varphi$ , $\theta$ ),
Regularity moduli linking auxiliary and main processes ( $\tau$ ).

No additional structural or geometric assumptions are required, and the methodology adapts to classical and modern iterative schemes with diverse stochastic perturbations. This minimal-data dependency underpins the wide applicability of the theory (Neri et al., 17 Apr 2025).

7. Comparative Methodologies and Significance

Classical exponential-supermartingale constructions (empirical Bernstein bounds, mixture martingales, self-normalized martingales) are powerful when exact exponential martingale structures are accessible. However, almost-supermartingale methods:

Require only a recursive contraction plus bounded noise,
Apply to matrix-product and other intractable update structures,
Yield the optimal $O\left(\frac{\log\log t}{t}\right)$ rate law, as proven for a wide spectrum of algorithms (Pham et al., 23 Nov 2025).

A plausible implication is that as optimization and learning algorithms grow in architectural complexity and nonlinearity, almost-supermartingale process theory supplies a flexible and robust analytical platform for precise convergence and concentration analysis.