Self-Normalized Martingales and Uniform Regret Bounds for Linear Regression

Published 2 May 2026 in stat.ML, cs.LG, and math.ST | (2605.01628v1)

Abstract: Self-normalized martingale inequalities lie at the heart of confidence ellipsoids for online least squares and, more broadly, many bandit and reinforcement-learning results. Yet existing vector and scalar results typically rely on bounded covariates and an explicit regularization matrix, producing bounds that are \emph{not scale-invariant}: although the self-normalized quantity is scale-invariant by definition, its standard upper bounds are not. We characterize when scale-invariant upper bounds on self-normalized martingales are possible. Without further assumptions, we prove that nontrivial scale-invariant bounds exist only in dimension $d=1$; moreover, in $d=1$ we obtain $O(\log T)$ scale-invariant self-normalized bounds without any assumptions on the covariates. In contrast, for $d>1$ we show that no nontrivial scale-invariant bound can hold in full generality. We then connect this dichotomy to \emph{doubly-uniform} regret in online linear regression (i.e., regret bounds that are simultaneously independent of the covariate scale and the comparator norm) and use it to resolve the open question of Gaillard, Gerchinovitz, Huard, and Stoltz, \emph{``Uniform regret bounds over $\mathbb{R}^d$ for the sequential linear regression problem with the square loss''} (ALT 2019): in $d=1$ we give an explicit algorithm with $O(\log T)$ doubly-uniform regret, whereas for $d>1$ sublinear doubly-uniform regret is impossible. Finally, under a natural \emph{smoothness} condition (bounded Radon--Nikodym derivatives of the conditional covariate laws with respect to a fixed base measure), we recover sublinear regret for $d>1$ without bounded covariates and derive a self-normalized concentration inequality free of the usual regularization penalties, yielding arguably a first natural scale-invariant bound for adaptive, non-i.i.d. vector martingales.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper establishes O(log T) regret bounds for one-dimensional online linear regression using scale-invariant self-normalized martingale inequalities.
It demonstrates that in higher dimensions, scale-invariant guarantees are impossible without additional smoothness assumptions, leading to linear regret in adversarial settings.
Under natural smooth covariate conditions, the authors recover sublinear regret and derive scale-invariant concentration bounds without requiring explicit regularization.

Self-Normalized Martingales and Uniform Regret Bounds for Linear Regression

Introduction and Problem Setting

The paper "Self-Normalized Martingales and Uniform Regret Bounds for Linear Regression" (2605.01628) investigates the intersection of self-normalized martingale inequalities and regret analysis in online linear regression, with particular emphasis on scale-invariant control and double uniformity—regret guarantees independent of both covariate scale and comparator norm—without boundedness assumptions. These issues are core to confidence set construction in online learning and high-dimensional statistics, and have ramifications in reinforcement learning, linear bandits, and minimax learning theory.

Classically, self-normalized processes provide concentration inequalities with denominators reflecting the observed data's scale. However, extant upper bounds require explicit regularization (e.g., additive constants or regularization matrices in the Gram matrix), inherently breaking scale-invariance, especially in vector-valued and adversarial settings.

Main Contributions

The paper contains a rigorous theoretical analysis establishing:

A sharp dichotomy between $d=1$ and $d>1$ in scale-invariant self-normalized martingale inequalities and doubly-uniform regret for online regression:
- For $d=1$ , the authors obtain tight $O(\log T)$ , fully scale-invariant bounds for dyadic (adversarial) self-normalized martingales with no covariate assumption. This translates, via explicit construction and equivalence arguments, into $O(\log T)$ doubly-uniform regret bounds for online linear regression—settling an open problem from [Gaillard et al., ALT 2019].
- For $d>1$ , no nontrivial scale-invariant upper bounds exist for self-normalized vector martingales under arbitrary (potentially adversarial) covariates. As a consequence, sublinear doubly-uniform regret is impossible in the general adversarial online regression setting.
Recovery of sublinear regret and scale-invariant self-normalized concentration under a natural smoothed environment assumption for $d>1$ :
- Under smoothness, where conditional distributions of covariates are absolutely continuous with respect to a fixed base measure with density bounded by $\kappa$ , sublinear regret bounds of order $O(\sqrt{dT\log(T)})$ (up to logarithmic factors) are obtained without boundedness, and without explicit regularization in the concentration inequality.
- The resulting concentration inequalities for adaptive, non-i.i.d. vector martingales are scale-invariant and avoid dependence on the maximum covariate norm or log-determinant penalties, marking a significant theoretical advance.

Technical Highlights and Results

1. Self-Normalized Martingale Inequalities

Scalar Case ( $d=1$ ): The paper derives explicit exponential moment and expectation bounds for the self-normalized process $d>1$ 0 for arbitrary ( $d>1$ 1-measurable) $d>1$ 2 without any moment or boundedness condition. The main result shows

$d>1$ 3

with constants independent of the covariate sequence. This scale-invariant control is essentially tight via matching lower bounds.

Vector Case ( $d>1$ 4): It is shown that no sublinear, scale-invariant upper bound on the canonical self-normalized quadratic form

$d>1$ 5

can hold for general adaptively chosen $d>1$ 6. Explicit constructions demonstrate that for all $d>1$ 7 and any $d>1$ 8, there exist covariates such that the normalized quadratic form is at least $d>1$ 9 in expectation.

2. Tight Regret Bounds for Online Linear Regression

One-Dimensional Case: Direct translation via the martingale/regret equivalence yields $d=1$ 0 regret uniformly over all comparator norms and covariate scales. The paper describes an explicit meta-algorithm leveraging scale-sensitive partitioning with appropriate aggregations of regularized least squares subroutines to operate without a priori scale knowledge.
Higher Dimensions: The impossibility results imply that for adversarial environments, sublinear doubly-uniform regret is unattainable; i.e., the minimax regret scales linearly in $d=1$ 1 in the worst case, even under bounded covariates.

3. Sublinear Regret and Concentration via Smoothness

Smooth Covariates Assumption: Under the requirement that conditional laws (given the past) of $d=1$ 2 admit bounded Radon–Nikodym derivatives with respect to a fixed reference measure ( $d=1$ 3), the authors bound the regret of the (unregularized) Vovk–Azoury–Warmuth (VAW) predictor by

$d=1$ 4

with high probability and without a maximum-norm or regularization penalty. The argument employs an elliptical potential-based combinatorial decomposition, leveraging coupling of the (potentially adaptive) sequence to an i.i.d. one.

Scale-Invariant Concentration: The same smoothness argument yields the first natural scale-invariant concentration inequality for adaptive, nonstationary vector martingales, free of explicit regularization matrices:

$d=1$ 5

The upper bound is independent of $d=1$ 6; only the smoothness parameter appears, and the result reduces to the standard i.i.d. regime in the case of independent draws.

Implications and Theoretical Significance

This work settles a central open question concerning the possibility (and impossibility) of obtaining regret bounds that enjoy double universality: independence from both data geometry (covariate scale/conditioning) and the comparator norm for online linear regression. The results show that the scalar case is qualitatively distinct from higher dimensions. From a technical perspective, the equivalence between self-normalized martingale inequalities and minimax online regression regret is refined and operationalized via Bellman recursion arguments.

On the practical side, the results delineate the precise conditions under which scale-invariant online least squares confidence bounds—and thereby practical, parameter-free linear bandit and reinforcement learning algorithms—can be justified. In high dimensions, the necessity of some restriction or regularization (data boundedness, smoothness, or otherwise) is non-negotiable for achieving meaningful uncertainty control and regret bounds in adversarial and adaptive settings.

Future Directions

This theoretical framework motivates several directions:

Refined Smoothness or Stochastic Assumptions: The smoothness model encompasses and interpolates between worst-case adversarial design and fully stochastic (i.i.d.) scenarios. Further investigation may reveal optimally minimal conditions for scale-invariant concentration and regret bounds in practical online environments (e.g., heavy-tailed settings, kernelized regression).
Extension to Nonlinear and Nonconvex Predictors: The machinery developed could be extended to nonlinear prediction (e.g., kernel methods) or nonconvex loss landscapes, exploring analogues of self-normalized concentration for more complex models.
Algorithmic Specialization: Constructing efficient, adaptive algorithms that operationalize the theory under various stochastic regularity conditions—and quantifying their empirical and theoretical tradeoffs—remains an open challenge.

Conclusion

The paper establishes a fundamental dichotomy for scale-invariant martingale control and doubly-uniform regret in online linear regression, resolving a notable open question. It shows that such guarantees are fundamentally restricted to the scalar ( $d=1$ 7) case without further structure, and recovers them in high dimensions only via assumptions on design randomness. The results bridge classical martingale theory, online learning, and statistical algorithmic analysis, and set theoretical boundaries on the scope of parameter-free learning guarantees in adversarial online regression protocols (2605.01628).