Doubly-Uniform Regret: Robustness Across Two Axes

Updated 5 July 2026

Doubly-uniform regret is defined as a guarantee that holds simultaneously along two distinct parameters, removing dependence on any single operating regime.
It is applied in settings like discounted online convex optimization, linear regression, reinforcement learning, and dynamic pricing, adapting to unknown discount factors, comparator norms, or accuracy thresholds.
This framework promotes algorithmic robustness by ensuring a single, unified bound over a continuum of objectives, reducing the need for parameter tuning and enhancing performance guarantees.

Doubly-uniform regret is a family of regret guarantees in which a single bound is required to hold simultaneously along two distinct axes of uncertainty or performance measurement. The term does not have a single universal definition across the literature. In discounted online convex optimization, it refers to uniformity over both the discount factor $\lambda$ and the horizon $T$ (Yang et al., 26 May 2025). In online linear regression, it denotes regret that is simultaneously uniform in the comparator $w$ and invariant to scaling of the covariates (Chen et al., 2 May 2026). In episodic reinforcement learning, the closely related Uniform-PAC framework yields guarantees that are uniform over both accuracy $\varepsilon$ and time $T$ (Dann et al., 2017). In dynamic contextual pricing, the term is used for regret bounds that are uniform over two nonparametric model classes, namely the mean utility function and the noise distribution (Chen et al., 2024). Earlier adjacent usages include “twice uniform regret” in sequential linear regression, meaning uniformity over comparator vectors and worst-case sequences (Gaillard et al., 2018), and uniformly bounded regret in both the number of candidates $n$ and the budget $k$ in the multi-secretary problem (Arlotto et al., 2017). This suggests that “doubly-uniform” is best understood as a structural property: a regret guarantee that removes dependence on two problem parameters or two specification choices at once.

1. Terminological scope and common structure

Across the cited works, doubly-uniform regret is not a single theorem but a recurring design objective. The shared pattern is that the learner is asked to compete without knowing in advance which member of a family of objectives, scales, smoothness classes, or horizons will be the relevant one. In each case, the regret guarantee is required to hold simultaneously, rather than after tuning to a single parameter.

In discounted online convex optimization, the target is a bound that holds “simultaneously for all $\lambda$ in a continuous interval” and is “essentially uniform over $T$ ” (Yang et al., 26 May 2025). In online linear regression with square loss, the bound must hold “for all $w \in \mathbb{R}^d$ without dependence on $T$ 0” and also be “scale-invariant” under $T$ 1 (Chen et al., 2 May 2026). In Uniform-PAC reinforcement learning, one high-probability event controls both the number of $T$ 2-suboptimal episodes for all $T$ 3 and the regret for all $T$ 4 (Dann et al., 2017). In dynamic pricing, the uniformity is over two unknown nonparametric objects, $T$ 5 and $T$ 6, with regret statements of the form

$T$ 7

for appropriate smoothness classes (Chen et al., 2024).

A plausible implication is that the phrase identifies a shift from single-parameter adaptivity to simultaneous robustness. Rather than optimizing for one comparator norm, one discount factor, one accuracy threshold, or one model class, the algorithm is analyzed against an entire continuum or product class.

2. Discounted online convex optimization

In discounted online convex optimization, the core performance metric is the $T$ 8-discounted regret

$T$ 9

where recent losses are weighted more heavily than distant ones (Yang et al., 26 May 2025). Under convexity, bounded gradients, bounded domain, and range normalization, Online Gradient Descent with update

$w$ 0

achieves $w$ 1 discounted regret when $w$ 2 is known, using

$w$ 3

with the explicit bound

$w$ 4

The step size and bound are independent of $w$ 5 (Yang et al., 26 May 2025).

The doubly-uniform question in this setting is whether one can adapt to an unknown discount factor. The interval considered is

$w$ 6

and Smoothed OGD (SOGD) is shown to satisfy, for every comparator $w$ 7 and all $w$ 8,

$w$ 9

where $\varepsilon$ 0 and $\varepsilon$ 1 in the theorem statement, yielding dominant order

$\varepsilon$ 2

uniformly across all $\varepsilon$ 3 (Yang et al., 26 May 2025).

The algorithmic construction uses a geometric grid

$\varepsilon$ 4

with one OGD expert per grid point, and sequentially aggregates them by Discounted-Normal-Predictor with conservative updating (DNP-cu). The combiner forms

$\varepsilon$ 5

and the crucial technical fact is that DNP-cu can aggregate experts even when they optimize discounted regret with different discount factors (Yang et al., 26 May 2025).

Here, “doubly-uniform” has a precise local meaning: the bound is uniform in $\varepsilon$ 6 over a continuous interval and uniform in $\varepsilon$ 7 up to the explicit $\varepsilon$ 8 adaptivity overhead. Relative to known- $\varepsilon$ 9 OGD, the price of adaptivity is that factor $T$ 0 (Yang et al., 26 May 2025).

3. Online linear regression, self-normalization, and scale invariance

In online linear regression with square loss, doubly-uniform regret is defined differently. The protocol is

$T$ 1

A regret bound is uniform over $T$ 2 if it holds simultaneously for all $T$ 3 without explicit dependence on $T$ 4, and it is doubly-uniform if it is additionally scale-invariant under $T$ 5 for any $T$ 6 (Chen et al., 2 May 2026).

The analytical object underlying this definition is the self-normalized quantity

$T$ 7

Under scaling $T$ 8, one has $T$ 9 and $n$ 0, so

$n$ 1

showing that the self-normalized ratio is intrinsically scale-invariant (Chen et al., 2 May 2026).

The main structural result is dimension-dependent. In dimension $n$ 2, nontrivial scale-invariant self-normalized bounds exist without boundedness or moment assumptions on the covariates beyond predictability. Specifically, for any dyadic martingale and any $n$ 3,

$n$ 4

and therefore

$n$ 5

This leads to an explicit algorithm with deterministic regret

$n$ 6

for $n$ 7 and $n$ 8, uniform in $n$ 9 and scale-invariant in $k$ 0 (Chen et al., 2 May 2026).

For $k$ 1, the paper proves impossibility in full generality. For any $k$ 2 and $k$ 3, there exists a dyadic martingale such that

$k$ 4

and this transfers to regret lower bounds showing that sublinear doubly-uniform regret is impossible without additional assumptions (Chen et al., 2 May 2026). The obstruction is geometric: the adversary can inject energy in directions orthogonal to the current information direction.

Under a smoothness condition on the conditional covariate laws,

$k$ 5

sublinear regret reappears in $k$ 6. The unregularized VAW predictor then satisfies

$k$ 7

with probability at least $k$ 8, and the self-normalized concentration inequality becomes

$k$ 9

without a regularization matrix $\lambda$ 0 and without boundedness assumptions on $\lambda$ 1 (Chen et al., 2 May 2026).

This literature also connects directly to an earlier notion of “twice uniform regret.” In sequential linear regression with square loss, uniform regret over $\lambda$ 2 means

$\lambda$ 3

and “twice uniform regret” refers to uniformity over all competitor vectors and worst-case feature and observation sequences (Gaillard et al., 2018). When features are known beforehand, the adapted metric forecaster achieves

$\lambda$ 4

while the minimax lower bound is

$\lambda$ 5

For sequentially revealed features, the parameter-free $\lambda$ 6 variant satisfies

$\lambda$ 7

which yields asymptotic order $\lambda$ 8 for any individual sequence, but a worst-case doubly-uniform bound remains open (Gaillard et al., 2018).

4. Uniformity in accuracy and time in episodic reinforcement learning

In episodic finite-horizon reinforcement learning, the closely related concept is Uniform-PAC. For episodic regret,

$\lambda$ 9

and for PAC-style performance the key count is

$T$ 0

An algorithm is Uniform-PAC if, for $T$ 1,

$T$ 2

with one event controlling all $T$ 3 (Dann et al., 2017).

The UBEV algorithm achieves, with probability at least $T$ 4, simultaneously for all $T$ 5,

$T$ 6

The conversion theorem then shows that if one has a bound of the form

$T$ 7

then on the same high-probability event one also has, for all $T$ 8 simultaneously,

$T$ 9

For UBEV, this yields

$w \in \mathbb{R}^d$ 0

for all $w \in \mathbb{R}^d$ 1 with probability at least $w \in \mathbb{R}^d$ 2 (Dann et al., 2017).

The paper does not use the phrase “doubly-uniform regret,” but the property is explicit: uniformity in $w \in \mathbb{R}^d$ 3 through Uniform-PAC and uniformity in $w \in \mathbb{R}^d$ 4 through anytime high-probability regret. The technical mechanism is time-uniform concentration, including finite-time law-of-the-iterated-logarithm style confidence widths such as

$w \in \mathbb{R}^d$ 5

This use of “double uniformity” differs from the OCO and regression usages, but it preserves the same structural theme: a single event controls an entire continuum of thresholds and all horizons (Dann et al., 2017).

5. Dynamic contextual pricing and two-class uniformity

In dynamic contextual pricing under doubly nonparametric random utility models, the data are contexts $w \in \mathbb{R}^d$ 6, prices $w \in \mathbb{R}^d$ 7, and binary purchases $w \in \mathbb{R}^d$ 8. The model is

$w \in \mathbb{R}^d$ 9

with both $T$ 00 and the noise CDF $T$ 01 unknown and modeled nonparametrically. Revenue is

$T$ 02

and regret is

$T$ 03

with oracle price

$T$ 04

(Chen et al., 2024).

Identification is based on two population equations under uniform random exploration prices. Writing $T$ 05,

$T$ 06

The oracle pricing map is expressed through

$T$ 07

so that

$T$ 08

under the regularity assumption $T$ 09 (Chen et al., 2024).

The “doubly-uniform” aspect has two layers. First, the estimators achieve uniform sup-norm control over their domains. For DNN,

$T$ 10

and for TDNN,

$T$ 11

The kernel estimators of $T$ 12 and $T$ 13 are also controlled uniformly over $T$ 14 and over $T$ 15 in a neighborhood (Chen et al., 2024).

Second, the regret bounds are uniform over both nonparametric classes: $T$ 16 for the DNN-based policy, and

$T$ 17

for the TDNN-based policy (Chen et al., 2024).

The paper explicitly characterizes this as “doubly-uniform” because the bounds are uniform over both the mean utility class and the noise distribution class. The analysis combines uniform convergence, stability of $T$ 18, and a second-order expansion showing that per-period regret scales quadratically in the price error: $T$ 19 This makes the interaction between context dimension $T$ 20 and noise smoothness $T$ 21 explicit in the regret exponent (Chen et al., 2024).

A recent OCO formulation uses “doubly-uniform” to mean simultaneous universality across curvature classes and adaptivity to gradient variation $T$ 22 (Zhao et al., 25 Nov 2025). In this setting,

$T$ 23

The goal is a single algorithm that simultaneously achieves

$T$ 24

for convex, exp-concave, and strongly convex losses respectively, while also recovering the standard worst-case $T$ 25-based universal rates (Zhao et al., 25 Nov 2025).

UniGrad.Correct and UniGrad.Bregman realize this by maintaining $T$ 26 base learners on exponential grids

$T$ 27

UniGrad.Correct achieves

$T$ 28

while UniGrad.Bregman achieves the same curvature-adaptive logarithmic bounds and the optimal convex

$T$ 29

rate (Zhao et al., 25 Nov 2025). The authors describe this as “universality + adaptivity” simultaneously. This is another instance of double uniformity, now across curvature families and variation regimes.

A different but historically important example appears in the multi-secretary problem. There, a policy has doubly-uniform regret if there exists a constant $T$ 30 independent of both $T$ 31 and $T$ 32 such that

$T$ 33

With finite support $T$ 34 and known probabilities, the adaptive Budget-Ratio policy satisfies

$T$ 35

for all $T$ 36, whereas for non-adaptive policies the regret is generally at least of order $T$ 37: $T$ 38 in a broad budget range (Arlotto et al., 2017). Here the two axes are the number of candidates and the budget.

Taken together, these formulations show that “doubly-uniform regret” is a cross-disciplinary label for regret bounds that are robust in two directions at once: two continuous performance parameters, two scales, two model classes, two difficulty measures, or two resource parameters. The specific axes vary by problem class, but the underlying methodological challenge is consistent: to obtain a single algorithmic guarantee that remains valid without committing in advance to one operating regime.