Wasserstein Martingale CLT for Online Optimization

Updated 9 February 2026

Wasserstein Martingale CLT is a framework that quantifies the convergence of discrete martingales to a Gaussian law measured by the 1-Wasserstein distance.
It leverages Stein's method to design online learning algorithms that achieve minimax-optimal loss and additively optimal (O(log T)) regret bounds.
The approach uses telescoping decompositions and Lipschitz continuity to derive explicit error bounds, offering robust improvements over classical methods like OGD and MWU.

The Wasserstein Martingale Central Limit Theorem (CLT) establishes a precise quantitative connection between discrete-time martingale processes and their normal approximations in the 1-Wasserstein distance, and has been recently leveraged to design online learning algorithms whose guarantees are not only minimax-optimal in leading order but also additively optimal up to logarithmic residuals. These techniques build on Stein's method and the analytic structure of the Wasserstein distance to yield operationally efficient online linear optimization (OLO) algorithms with sharp performance tradeoffs. The connection to the Röllin (2018) Wasserstein martingale CLT and its operationalization for OLO is detailed in "Operationalizing Stein's Method for Online Linear Optimization: CLT-Based Optimal Tradeoffs" (Zhang et al., 6 Feb 2026).

1. Foundations: Wasserstein Martingale CLT

The 1-Wasserstein distance between two probability measures $\mu$ and $\nu$ on $\mathbb{R}$ is

$d_W(\mu,\nu) = \sup_{\|h\|_{\mathrm{Lip}}\leq 1} | \mathbb{E}_\mu[h] - \mathbb{E}_\nu[h] |\,,$

where the supremum is over all functions $h$ with Lipschitz constant at most $1$. The classical CLT gives convergence in distribution; however, Wasserstein CLTs, and in particular the martingale variant [Röllin 2018], quantify the rate at which sums of martingale difference sequences approach a normal law in $d_W$ , with explicit bounds.

For a real-valued martingale difference sequence $(X_t)$ such that $\sum_{t=1}^T \mathbb{E}[X_t^2 \mid \mathcal{F}_{t-1}] = 1$ , the martingale CLT states

$d_W\left(\sum_{t=1}^T X_t,\, Z \right) \leq C \sum_{t=1}^T \mathbb{E} \left[ |X_t|^3 \right],$

where $Z\sim \mathcal{N}(0,1)$ and $C$ is a universal constant. Röllin (2018) gives particularly clean proofs and explicit (often optimal) constants.

2. Online Learning and the Stein-Wasserstein Approach

Stein's method is a powerful analytic technique for proving distributional approximations. In the OLO context, it allows one to collapse the discrete dynamic programming recursion that describes optimal betting or online prediction into a tractable differential-operator form. In (Zhang et al., 6 Feb 2026), this methodology is operationalized to produce OLO algorithms which enjoy additively sharp bounds governed by the same quantities that appear in Wasserstein CLTs.

Algorithmic Structure

The core algorithm iterates the following structure:

At each round $t$ , maintain a running sum $s_{t-1}$ of observed gradients/losses.
Solve the Stein equation:

$\sigma^2 f'(x) - (x-\mu)f(x) = h(x) - \mathbb{E}[h(\mu+\sigma Z)]$

for a chosen $1$-Lipschitz or convex function $h$ .

Set the learner's action $x_t$ at round $t$ to

$x_t = \mathbb{E}_{Z \sim \mathcal{N}(0,1)}\bigl[ f_{s_{t-1}, \rho_{t-1}, h}(s_{t-1} + \rho_t Z) \bigr].$

Update $s_t = s_{t-1} + g_t$ after observing the new gradient $g_t$ .

This uses a "telescoping/Lindeberg" construction matching that in Röllin's martingale CLT proof, connecting the discrete-time sum of the adversary's sequence to a Gaussian reference via intermediate interpolants $m_t$ .

Additively Sharp Guarantees

The effect is that, for any convex $1$-Lipschitz $h$ with $\mathbb{E}[h(Z)]=0$ for $Z \sim \mathcal{N}(0,T)$ ,

$L_T = \sum_{t=1}^T g_t x_t \leq -h\left( \sum_{t=1}^T g_t \right) + O(\log T),$

where $O(\log T)$ is a provably tight additive term, reflecting the Wasserstein convergence rate (Zhang et al., 6 Feb 2026).

Similarly, for all comparators $u$ in the feasible set,

$R_T(u) \leq h^*(-u) + \mathbb{E}[h(\sqrt{T} Z)] + O(\log T),$

where $h^*$ is the Fenchel-Legendre conjugate.

3. Comparative and Robustness Properties

The Wasserstein martingale CLT apparatus instantiated via operational Stein methods underpins tradeoffs beyond leading-order big-O optimality. These tradeoffs are not available with classical algorithms such as OGD or MWU, which are generally only asymptotically optimal and lack sharp additive control.

Distinctive features include:

Uniform improvement over OGD and MWU in regret tradeoffs, regardless of parameterization (see Section 2.3 in (Zhang et al., 6 Feb 2026)).
Realization of an optimal tradeoff curve between total loss and maximum regret across all comparators; the Pareto frontier is given via a soft-thresholded $h$ tied to the CLT rate function.
Extension to expectation-based guarantees under noise or martingale difference feedback, with the algorithm's performance matching the Wasserstein error of the underlying martingale CLT.

4. Analytical Proof Techniques

The central proof artifact is a telescoping decomposition: $\mathbb{E}_Z[h(m_T)] - \mathbb{E}_Z[h(m_0)] = \sum_{t=1}^T \left( \mathbb{E}_Z h(s_t + \rho_t Z) - \mathbb{E}_Z h(s_{t-1} + \rho_{t-1} Z) \right ),$ with each difference mapped to a Stein equation bounding term, using explicit uniform Stein factor bounds: $\|f\|_\infty \leq 1, \quad \|f'\|_\infty \leq \sqrt{\frac{2}{\pi}} \sigma^{-1}, \quad \|f''\|_\infty \leq 2 \sigma^{-2},$ which yields the $O(\log T)$ additive residual. Monotonicity of the Stein solution for convex $h$ (Lemma 2.4) is key for robustness of the comparison.

These arguments quantify the optimality gap between the online algorithm's performance and the normal approximation for adversarial (and in certain extensions, stochastic) feedback (Zhang et al., 6 Feb 2026).

5. Broader Implications and Open Directions

The operationalization of the Wasserstein martingale CLT for online optimization unifies:

Classic dynamic programming-based optimal betting strategies (e.g., Cover's algorithm for the one-dimensional OLO problem).
Normal approximation theory, determining both asymptotic and non-asymptotic algorithmic optimality.
Stein's method as an algorithmic design principle, not merely a probabilistic proof tool.

Open questions include further generalization to vector-valued or matrix-valued martingale processes, as well as characterizing the exact optimal residuals in high-dimensional and infinite-dimensional settings.

A plausible implication is that this approach delineates the ultimate achievable risk in online convex prediction, both in finite time and in minimax settings, matching the Wasserstein convergence rates of the associated martingale CLT (Zhang et al., 6 Feb 2026). This unifies probabilistic limit theory and online regret optimization at an unprecedented level of quantitative precision.

Markdown Report Issue Upgrade to Chat

References (1)

Operationalizing Stein's Method for Online Linear Optimization: CLT-Based Optimal Tradeoffs (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Wasserstein Martingale Central Limit Theorem (CLT).