Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
96 tokens/sec
Gemini 2.5 Pro Premium
44 tokens/sec
GPT-5 Medium
18 tokens/sec
GPT-5 High Premium
18 tokens/sec
GPT-4o
105 tokens/sec
DeepSeek R1 via Azure Premium
83 tokens/sec
GPT OSS 120B via Groq Premium
475 tokens/sec
Kimi K2 via Groq Premium
259 tokens/sec
2000 character limit reached

Minimax Regret Lower Bounds

Updated 13 August 2025
  • Minimax regret lower bounds are a formal measure that defines the smallest achievable excess loss in sequential and statistical decision-making.
  • They unify geometric, duality-based, and information-theoretic approaches to benchmark performance in settings like online convex optimization and bandit learning.
  • Applications include optimal algorithm design in online learning, reinforcement learning, and prediction tasks, with implications drawn from covering numbers and f-divergence analyses.

Minimax regret lower bounds formalize the worst-case performance of decision-making and estimation strategies in sequential and statistical learning, quantitatively specifying the smallest achievable excess loss (regret) relative to a reference class (e.g., best fixed policy, parameter, or expert) under adversarial or stochastic models. These lower bounds constitute a fundamental benchmark for online convex optimization, bandit algorithms, online regression, sequential probability assignment, and general nonparametric estimation, unifying information-theoretic, geometric, and algorithmic perspectives.

1. Geometric and Duality-Based Characterizations

A central approach to minimax regret lower bounds in online convex optimization (OCO) uses minimax duality, connecting the adversarial and stochastic viewpoints. The minimax regret over T rounds is defined by the game value

RT=inff1Fsupz1ZinffTFsupzTZ[t=1T(zt,ft)inffFt=1T(zt,f)].R_T = \inf_{f_1 \in \mathcal{F}} \sup_{z_1 \in \mathcal{Z}}\cdots \inf_{f_T \in \mathcal{F}} \sup_{z_T \in \mathcal{Z}} \left[ \sum_{t=1}^T \ell(z_t, f_t) - \inf_{f \in \mathcal{F}} \sum_{t=1}^T \ell(z_t, f) \right].

Von Neumann's minimax theorem allows interchange of infima and suprema, yielding the dual representation

RT=suppP(ZT)[t=1TinfftFE[(Zt,ft)Z1t1]inffFt=1T(Zt,f)].R_T = \sup_{p \in \mathcal{P}(\mathcal{Z}^T)}\left[ \sum_{t=1}^T \inf_{f_t \in \mathcal{F}} \mathbb{E}[\ell(Z_t, f_t) | Z_{1}^{t-1}] - \inf_{f \in \mathcal{F}}\sum_{t=1}^T\ell(Z_t, f) \right].

Defining the concave functional

Φ(p)=inffFEZp[(Z,f)],\Phi(p) = \inf_{f \in \mathcal{F}}\mathbb{E}_{Z \sim p}[\ell(Z, f)],

and the empirical measure p^=(1/T)t=1TδZt\hat{p} = (1/T)\sum_{t=1}^T \delta_{Z_t}, the regret can be compactly written as

RT(p)=t=1TΦ(pt)Φ(p^).R_T(p) = \sum_{t=1}^T \Phi(p_t) - \Phi(\hat{p}).

This duality shows that regret can be viewed as the “Jensen gap” for the concave Φ\Phi, i.e.,

Φ(1Tt=1Tpt)1Tt=1TΦ(pt).\Phi\left(\frac{1}{T} \sum_{t=1}^T p_t \right) \geq \frac{1}{T} \sum_{t=1}^T \Phi(p_t).

The gap is large when Φ\Phi is non-differentiable (e.g., the loss class has exposed faces), signifying intrinsic statistical difficulty and leading to lower bounds such as Ω(T)\Omega(\sqrt{T}) in general or Ω(TlogN)\Omega(\sqrt{T\log N}) for expert advice with N actions (0903.5328).

2. Information-Theoretic Lower Bounds via ff-Divergences

For general estimation and prediction problems, minimax risk (or minimax regret) lower bounds are universally captured by reductions to hypothesis testing and the use of ff-divergences. For a parameter set Θ\Theta with prior ww on a finite subset FΘF \subset \Theta, the Bayes risk is

rˉw=1maxθF(wθpθ(x))dμ(x),\bar{r}_w = 1 - \int \max_{\theta \in F} (w_\theta p_\theta(x)) d\mu(x),

and for a convex ff, the central inequality (see Theorem 2.1 in (Guntuboyina, 2010)) is

θFwθDf(PθQ)Wf(1rˉwW)+(1W)f(rˉw1W),\sum_{\theta \in F} w_\theta D_f(P_\theta \| Q) \geq W f\left(\frac{1-\bar{r}_w}{W}\right) + (1-W) f\left(\frac{\bar{r}_w}{1-W}\right),

where DfD_f is the ff-divergence, and WW depends on a maximization over the posterior.

Specializations yield Fano’s inequality (using the KL-divergence), Pinsker’s inequality, and risk lower bounds in terms of total variation and global metric entropy. These bounds show that the minimax risk or regret is dictated by the “packing” of the model space under the chosen divergence, relating tightly to estimation tasks involving convex bodies or covariance matrices (Guntuboyina, 2010).

3. Lower Bounds in Online and Bandit Learning

In adversarial online learning and bandit settings, minimax regret lower bounds are obtained by constructing “hard” instances using combinatorial designs and information-theoretic arguments (KL divergence, Pinsker's inequality). For combinatorial prediction with binary vectors under LL_\infty or L2L_2 adversaries, worst-case regret lower bounds are

  • Full-information: RndnR_n \gtrsim d\sqrt{n} (LL_\infty), RndnR_n \gtrsim \sqrt{d n} (L2L_2),
  • Semi-bandit: similar orders (possibly with extra logd\sqrt{\log d} factors),
  • Bandit: for well-chosen combinatorial sets, d3/2nd^{3/2} \sqrt{n}.

For classical online experts, non-asymptotic lower bounds are derived from Gaussian and random walk maxima. For d experts and n rounds (Orabona et al., 2015),

E[max1idZi(n)]0.09nlnd2n,\mathbb{E}[\max_{1 \leq i \leq d} Z^{(n)}_i] \geq 0.09 \sqrt{n \ln d} - 2\sqrt{n},

so any algorithm suffers at least (1/2)E[maxiZi(n)](1/2)\mathbb{E}[\max_i Z^{(n)}_i] regret, recovering the correct Θ(nlnd)\Theta(\sqrt{n \ln d}) rate.

In high-dimensional bandits with structured action sets, minimax lower bounds inherit extra logarithmic dependencies, e.g., Ω(dTlogTlogn)\Omega(\sqrt{dT \log T \log n}) in linear contextual bandits (Li et al., 2019).

4. Sequential Probability Assignment and Entropic Lower Bounds

Minimax regret lower bounds in sequential probability assignment under log-loss are governed by the Shtarkov sum, which calculates the “compression redundancy,” and by geometric entropy measures:

Rn(Q)=logyYnsupqQq(y).R_n(Q) = \log \sum_{y \in Y^n} \sup_{q \in Q} q(y).

A core result is that for function classes with sequential square-root covering entropy H(Q,α,n)H(Q, \alpha, n) scaling as O~(αp)\tilde{O}(\alpha^{-p}),

Rn(Q)=Ω~(np/(p+2))R_n(Q) = \tilde{\Omega}(n^{p/(p+2)})

for p2p \leq 2, with different rates above this threshold (Jia et al., 22 Mar 2025).

This formalism unifies minimax regret with geometric covering approaches—large entropy (richness) in the square-root metric (Hellinger sense) leads directly to higher unavoidable regret.

5. Lower Bound Techniques in Structured and Realizable Settings

In supervised and transductive classification, sample complexity and minimax risk lower bounds are derived using VC dimension arguments, shattering combinatorics, and binomial approximations. For a hypothesis class of VC-dimension dd, the minimax sample for ϵ\epsilon-error with probability at least 1δ1 - \delta is

mΩ(dϵ+log(1/δ)ϵ),m \geq \Omega\left( \frac{d}{\epsilon} + \frac{\log(1/\delta)}{\epsilon} \right),

which matches the inductive (i.i.d.) supervised sample complexity (Tolstikhin et al., 2016). Crucially, the availability of unlabeled data in transduction or semi-supervised learning does not improve the minimax rate in the worst case.

6. Lower Bounds in Constrained and Non-Stationary Online Settings

Switching-constrained OCO (limited number K of allowed switches over T rounds) admits a tight minimax lower bound of order

Θ(TK),\Theta\left(\frac{T}{\sqrt{K}}\right),

with a precise lower bound T/2KT/\sqrt{2K} in one dimension, established via "fugal game" relaxation (Chen et al., 2019). This absence of phase transition stands in contrast to the discrete prediction-from-experts case.

In reinforcement learning for non-stationary, finite-horizon MDPs with H stages, S states, and A actions, new constructions yield a regret lower bound

Ω(H3SAT),\Omega\left(\sqrt{H^3 S A T}\right),

reflecting the increased exploration difficulty when the transition kernel varies across stages (Domingues et al., 2020).

7. Unifying Features and Implications

  • Minimax regret lower bounds depend on the geometry of the action or hypothesis space (captured by convexity, Lipschitzness, covering/packing numbers, and Bregman/jensen-divergence).
  • Information-theoretic quantities (ff-divergences, metric entropy, Shtarkov sum) are both necessary and sufficient for characterizing minimax rates in estimation, online learning, and sequential prediction.
  • Rate-optimal algorithms must fundamentally adapt to the statistical complexity dictated by corresponding lower bounds—no learning rule can uniformly outperform these benchmarks absent additional problem structure.
  • For practical applications, these bounds quantify the best-possible trade-off between exploration and exploitation, guide optimal algorithm design, and expose the regime(s) where further progress can only come from exploiting structural assumptions beyond worst-case analysis.
Problem/Class Minimax Regret Lower Bound Key Techniques/Quantities
Online convex optimization (OCO) Ω(T)\Omega(\sqrt{T}) or Ω(TlogN)\Omega(\sqrt{T\log N}) Minimax duality, Jensen gap, Gaussian complexity
Bandits (linear/contextual) Ω(dTlogTlogn)\Omega(\sqrt{dT \log T \log n}) Elliptical potential, adversarial instance
Classification (VC dim. dd) Ω(d/m)\Omega(d/m) sample complexity Shattering, combinatorics, Binomial tail
Probability assignment (log-loss) Ω~(np/(p+2))\tilde{\Omega}(n^{p/(p+2)}) (pp = entropy exponent) Shtarkov sum, sequential covering, Hellinger
Streaming bandits (BB passes) Ω((TB)αK1α)\Omega((TB)^\alpha K^{1-\alpha}) Arm identification, detection, sample complexity
RL (finite MDPs, non-stationary) Ω(H3SAT)\Omega(\sqrt{H^3SAT}) Hard MDP constructions, KL, change-of-measure

These lower bounds serve as a foundational standard in sequential learning and decision theory, shaping both theoretical investigations and the evaluation of new algorithmic proposals.