Papers
Topics
Authors
Recent
Search
2000 character limit reached

Doubly-Uniform Regret: Robustness Across Two Axes

Updated 5 July 2026
  • Doubly-uniform regret is defined as a guarantee that holds simultaneously along two distinct parameters, removing dependence on any single operating regime.
  • It is applied in settings like discounted online convex optimization, linear regression, reinforcement learning, and dynamic pricing, adapting to unknown discount factors, comparator norms, or accuracy thresholds.
  • This framework promotes algorithmic robustness by ensuring a single, unified bound over a continuum of objectives, reducing the need for parameter tuning and enhancing performance guarantees.

Doubly-uniform regret is a family of regret guarantees in which a single bound is required to hold simultaneously along two distinct axes of uncertainty or performance measurement. The term does not have a single universal definition across the literature. In discounted online convex optimization, it refers to uniformity over both the discount factor λ\lambda and the horizon TT (Yang et al., 26 May 2025). In online linear regression, it denotes regret that is simultaneously uniform in the comparator ww and invariant to scaling of the covariates (Chen et al., 2 May 2026). In episodic reinforcement learning, the closely related Uniform-PAC framework yields guarantees that are uniform over both accuracy ε\varepsilon and time TT (Dann et al., 2017). In dynamic contextual pricing, the term is used for regret bounds that are uniform over two nonparametric model classes, namely the mean utility function and the noise distribution (Chen et al., 2024). Earlier adjacent usages include “twice uniform regret” in sequential linear regression, meaning uniformity over comparator vectors and worst-case sequences (Gaillard et al., 2018), and uniformly bounded regret in both the number of candidates nn and the budget kk in the multi-secretary problem (Arlotto et al., 2017). This suggests that “doubly-uniform” is best understood as a structural property: a regret guarantee that removes dependence on two problem parameters or two specification choices at once.

1. Terminological scope and common structure

Across the cited works, doubly-uniform regret is not a single theorem but a recurring design objective. The shared pattern is that the learner is asked to compete without knowing in advance which member of a family of objectives, scales, smoothness classes, or horizons will be the relevant one. In each case, the regret guarantee is required to hold simultaneously, rather than after tuning to a single parameter.

In discounted online convex optimization, the target is a bound that holds “simultaneously for all λ\lambda in a continuous interval” and is “essentially uniform over TT” (Yang et al., 26 May 2025). In online linear regression with square loss, the bound must hold “for all wRdw \in \mathbb{R}^d without dependence on TT0” and also be “scale-invariant” under TT1 (Chen et al., 2 May 2026). In Uniform-PAC reinforcement learning, one high-probability event controls both the number of TT2-suboptimal episodes for all TT3 and the regret for all TT4 (Dann et al., 2017). In dynamic pricing, the uniformity is over two unknown nonparametric objects, TT5 and TT6, with regret statements of the form

TT7

for appropriate smoothness classes (Chen et al., 2024).

A plausible implication is that the phrase identifies a shift from single-parameter adaptivity to simultaneous robustness. Rather than optimizing for one comparator norm, one discount factor, one accuracy threshold, or one model class, the algorithm is analyzed against an entire continuum or product class.

2. Discounted online convex optimization

In discounted online convex optimization, the core performance metric is the TT8-discounted regret

TT9

where recent losses are weighted more heavily than distant ones (Yang et al., 26 May 2025). Under convexity, bounded gradients, bounded domain, and range normalization, Online Gradient Descent with update

ww0

achieves ww1 discounted regret when ww2 is known, using

ww3

with the explicit bound

ww4

The step size and bound are independent of ww5 (Yang et al., 26 May 2025).

The doubly-uniform question in this setting is whether one can adapt to an unknown discount factor. The interval considered is

ww6

and Smoothed OGD (SOGD) is shown to satisfy, for every comparator ww7 and all ww8,

ww9

where ε\varepsilon0 and ε\varepsilon1 in the theorem statement, yielding dominant order

ε\varepsilon2

uniformly across all ε\varepsilon3 (Yang et al., 26 May 2025).

The algorithmic construction uses a geometric grid

ε\varepsilon4

with one OGD expert per grid point, and sequentially aggregates them by Discounted-Normal-Predictor with conservative updating (DNP-cu). The combiner forms

ε\varepsilon5

and the crucial technical fact is that DNP-cu can aggregate experts even when they optimize discounted regret with different discount factors (Yang et al., 26 May 2025).

Here, “doubly-uniform” has a precise local meaning: the bound is uniform in ε\varepsilon6 over a continuous interval and uniform in ε\varepsilon7 up to the explicit ε\varepsilon8 adaptivity overhead. Relative to known-ε\varepsilon9 OGD, the price of adaptivity is that factor TT0 (Yang et al., 26 May 2025).

3. Online linear regression, self-normalization, and scale invariance

In online linear regression with square loss, doubly-uniform regret is defined differently. The protocol is

TT1

A regret bound is uniform over TT2 if it holds simultaneously for all TT3 without explicit dependence on TT4, and it is doubly-uniform if it is additionally scale-invariant under TT5 for any TT6 (Chen et al., 2 May 2026).

The analytical object underlying this definition is the self-normalized quantity

TT7

Under scaling TT8, one has TT9 and nn0, so

nn1

showing that the self-normalized ratio is intrinsically scale-invariant (Chen et al., 2 May 2026).

The main structural result is dimension-dependent. In dimension nn2, nontrivial scale-invariant self-normalized bounds exist without boundedness or moment assumptions on the covariates beyond predictability. Specifically, for any dyadic martingale and any nn3,

nn4

and therefore

nn5

This leads to an explicit algorithm with deterministic regret

nn6

for nn7 and nn8, uniform in nn9 and scale-invariant in kk0 (Chen et al., 2 May 2026).

For kk1, the paper proves impossibility in full generality. For any kk2 and kk3, there exists a dyadic martingale such that

kk4

and this transfers to regret lower bounds showing that sublinear doubly-uniform regret is impossible without additional assumptions (Chen et al., 2 May 2026). The obstruction is geometric: the adversary can inject energy in directions orthogonal to the current information direction.

Under a smoothness condition on the conditional covariate laws,

kk5

sublinear regret reappears in kk6. The unregularized VAW predictor then satisfies

kk7

with probability at least kk8, and the self-normalized concentration inequality becomes

kk9

without a regularization matrix λ\lambda0 and without boundedness assumptions on λ\lambda1 (Chen et al., 2 May 2026).

This literature also connects directly to an earlier notion of “twice uniform regret.” In sequential linear regression with square loss, uniform regret over λ\lambda2 means

λ\lambda3

and “twice uniform regret” refers to uniformity over all competitor vectors and worst-case feature and observation sequences (Gaillard et al., 2018). When features are known beforehand, the adapted metric forecaster achieves

λ\lambda4

while the minimax lower bound is

λ\lambda5

For sequentially revealed features, the parameter-free λ\lambda6 variant satisfies

λ\lambda7

which yields asymptotic order λ\lambda8 for any individual sequence, but a worst-case doubly-uniform bound remains open (Gaillard et al., 2018).

4. Uniformity in accuracy and time in episodic reinforcement learning

In episodic finite-horizon reinforcement learning, the closely related concept is Uniform-PAC. For episodic regret,

λ\lambda9

and for PAC-style performance the key count is

TT0

An algorithm is Uniform-PAC if, for TT1,

TT2

with one event controlling all TT3 (Dann et al., 2017).

The UBEV algorithm achieves, with probability at least TT4, simultaneously for all TT5,

TT6

The conversion theorem then shows that if one has a bound of the form

TT7

then on the same high-probability event one also has, for all TT8 simultaneously,

TT9

For UBEV, this yields

wRdw \in \mathbb{R}^d0

for all wRdw \in \mathbb{R}^d1 with probability at least wRdw \in \mathbb{R}^d2 (Dann et al., 2017).

The paper does not use the phrase “doubly-uniform regret,” but the property is explicit: uniformity in wRdw \in \mathbb{R}^d3 through Uniform-PAC and uniformity in wRdw \in \mathbb{R}^d4 through anytime high-probability regret. The technical mechanism is time-uniform concentration, including finite-time law-of-the-iterated-logarithm style confidence widths such as

wRdw \in \mathbb{R}^d5

This use of “double uniformity” differs from the OCO and regression usages, but it preserves the same structural theme: a single event controls an entire continuum of thresholds and all horizons (Dann et al., 2017).

5. Dynamic contextual pricing and two-class uniformity

In dynamic contextual pricing under doubly nonparametric random utility models, the data are contexts wRdw \in \mathbb{R}^d6, prices wRdw \in \mathbb{R}^d7, and binary purchases wRdw \in \mathbb{R}^d8. The model is

wRdw \in \mathbb{R}^d9

with both TT00 and the noise CDF TT01 unknown and modeled nonparametrically. Revenue is

TT02

and regret is

TT03

with oracle price

TT04

(Chen et al., 2024).

Identification is based on two population equations under uniform random exploration prices. Writing TT05,

TT06

The oracle pricing map is expressed through

TT07

so that

TT08

under the regularity assumption TT09 (Chen et al., 2024).

The “doubly-uniform” aspect has two layers. First, the estimators achieve uniform sup-norm control over their domains. For DNN,

TT10

and for TDNN,

TT11

The kernel estimators of TT12 and TT13 are also controlled uniformly over TT14 and over TT15 in a neighborhood (Chen et al., 2024).

Second, the regret bounds are uniform over both nonparametric classes: TT16 for the DNN-based policy, and

TT17

for the TDNN-based policy (Chen et al., 2024).

The paper explicitly characterizes this as “doubly-uniform” because the bounds are uniform over both the mean utility class and the noise distribution class. The analysis combines uniform convergence, stability of TT18, and a second-order expansion showing that per-period regret scales quadratically in the price error: TT19 This makes the interaction between context dimension TT20 and noise smoothness TT21 explicit in the regret exponent (Chen et al., 2024).

A recent OCO formulation uses “doubly-uniform” to mean simultaneous universality across curvature classes and adaptivity to gradient variation TT22 (Zhao et al., 25 Nov 2025). In this setting,

TT23

The goal is a single algorithm that simultaneously achieves

TT24

for convex, exp-concave, and strongly convex losses respectively, while also recovering the standard worst-case TT25-based universal rates (Zhao et al., 25 Nov 2025).

UniGrad.Correct and UniGrad.Bregman realize this by maintaining TT26 base learners on exponential grids

TT27

UniGrad.Correct achieves

TT28

while UniGrad.Bregman achieves the same curvature-adaptive logarithmic bounds and the optimal convex

TT29

rate (Zhao et al., 25 Nov 2025). The authors describe this as “universality + adaptivity” simultaneously. This is another instance of double uniformity, now across curvature families and variation regimes.

A different but historically important example appears in the multi-secretary problem. There, a policy has doubly-uniform regret if there exists a constant TT30 independent of both TT31 and TT32 such that

TT33

With finite support TT34 and known probabilities, the adaptive Budget-Ratio policy satisfies

TT35

for all TT36, whereas for non-adaptive policies the regret is generally at least of order TT37: TT38 in a broad budget range (Arlotto et al., 2017). Here the two axes are the number of candidates and the budget.

Taken together, these formulations show that “doubly-uniform regret” is a cross-disciplinary label for regret bounds that are robust in two directions at once: two continuous performance parameters, two scales, two model classes, two difficulty measures, or two resource parameters. The specific axes vary by problem class, but the underlying methodological challenge is consistent: to obtain a single algorithmic guarantee that remains valid without committing in advance to one operating regime.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Doubly-Uniform Regret.