Minimax Regret Lower Bounds

Updated 13 August 2025

Minimax regret lower bounds are a formal measure that defines the smallest achievable excess loss in sequential and statistical decision-making.
They unify geometric, duality-based, and information-theoretic approaches to benchmark performance in settings like online convex optimization and bandit learning.
Applications include optimal algorithm design in online learning, reinforcement learning, and prediction tasks, with implications drawn from covering numbers and f-divergence analyses.

Minimax regret lower bounds formalize the worst-case performance of decision-making and estimation strategies in sequential and statistical learning, quantitatively specifying the smallest achievable excess loss (regret) relative to a reference class (e.g., best fixed policy, parameter, or expert) under adversarial or stochastic models. These lower bounds constitute a fundamental benchmark for online convex optimization, bandit algorithms, online regression, sequential probability assignment, and general nonparametric estimation, unifying information-theoretic, geometric, and algorithmic perspectives.

1. Geometric and Duality-Based Characterizations

A central approach to minimax regret lower bounds in online convex optimization (OCO) uses minimax duality, connecting the adversarial and stochastic viewpoints. The minimax regret over T rounds is defined by the game value

$R_T = \inf_{f_1 \in \mathcal{F}} \sup_{z_1 \in \mathcal{Z}}\cdots \inf_{f_T \in \mathcal{F}} \sup_{z_T \in \mathcal{Z}} \left[ \sum_{t=1}^T \ell(z_t, f_t) - \inf_{f \in \mathcal{F}} \sum_{t=1}^T \ell(z_t, f) \right].$

Von Neumann's minimax theorem allows interchange of infima and suprema, yielding the dual representation

$R_T = \sup_{p \in \mathcal{P}(\mathcal{Z}^T)}\left[ \sum_{t=1}^T \inf_{f_t \in \mathcal{F}} \mathbb{E}[\ell(Z_t, f_t) | Z_{1}^{t-1}] - \inf_{f \in \mathcal{F}}\sum_{t=1}^T\ell(Z_t, f) \right].$

Defining the concave functional

$\Phi(p) = \inf_{f \in \mathcal{F}}\mathbb{E}_{Z \sim p}[\ell(Z, f)],$

and the empirical measure $\hat{p} = (1/T)\sum_{t=1}^T \delta_{Z_t}$ , the regret can be compactly written as

$R_T(p) = \sum_{t=1}^T \Phi(p_t) - \Phi(\hat{p}).$

This duality shows that regret can be viewed as the “Jensen gap” for the concave $\Phi$ , i.e.,

$\Phi\left(\frac{1}{T} \sum_{t=1}^T p_t \right) \geq \frac{1}{T} \sum_{t=1}^T \Phi(p_t).$

The gap is large when $\Phi$ is non-differentiable (e.g., the loss class has exposed faces), signifying intrinsic statistical difficulty and leading to lower bounds such as $\Omega(\sqrt{T})$ in general or $\Omega(\sqrt{T\log N})$ for expert advice with N actions (0903.5328).

2. Information-Theoretic Lower Bounds via $f$ -Divergences

For general estimation and prediction problems, minimax risk (or minimax regret) lower bounds are universally captured by reductions to hypothesis testing and the use of $f$ -divergences. For a parameter set $\Theta$ with prior $w$ on a finite subset $F \subset \Theta$ , the Bayes risk is

$\bar{r}_w = 1 - \int \max_{\theta \in F} (w_\theta p_\theta(x)) d\mu(x),$

and for a convex $f$ , the central inequality (see Theorem 2.1 in (Guntuboyina, 2010)) is

$\sum_{\theta \in F} w_\theta D_f(P_\theta \| Q) \geq W f\left(\frac{1-\bar{r}_w}{W}\right) + (1-W) f\left(\frac{\bar{r}_w}{1-W}\right),$

where $D_f$ is the $f$ -divergence, and $W$ depends on a maximization over the posterior.

Specializations yield Fano’s inequality (using the KL-divergence), Pinsker’s inequality, and risk lower bounds in terms of total variation and global metric entropy. These bounds show that the minimax risk or regret is dictated by the “packing” of the model space under the chosen divergence, relating tightly to estimation tasks involving convex bodies or covariance matrices (Guntuboyina, 2010).

3. Lower Bounds in Online and Bandit Learning

In adversarial online learning and bandit settings, minimax regret lower bounds are obtained by constructing “hard” instances using combinatorial designs and information-theoretic arguments (KL divergence, Pinsker's inequality). For combinatorial prediction with binary vectors under $L_\infty$ or $L_2$ adversaries, worst-case regret lower bounds are

Full-information: $R_n \gtrsim d\sqrt{n}$ ( $L_\infty$ ), $R_n \gtrsim \sqrt{d n}$ ( $L_2$ ),
Semi-bandit: similar orders (possibly with extra $\sqrt{\log d}$ factors),
Bandit: for well-chosen combinatorial sets, $d^{3/2} \sqrt{n}$ .

For classical online experts, non-asymptotic lower bounds are derived from Gaussian and random walk maxima. For d experts and n rounds (Orabona et al., 2015),

$\mathbb{E}[\max_{1 \leq i \leq d} Z^{(n)}_i] \geq 0.09 \sqrt{n \ln d} - 2\sqrt{n},$

so any algorithm suffers at least $(1/2)\mathbb{E}[\max_i Z^{(n)}_i]$ regret, recovering the correct $\Theta(\sqrt{n \ln d})$ rate.

In high-dimensional bandits with structured action sets, minimax lower bounds inherit extra logarithmic dependencies, e.g., $\Omega(\sqrt{dT \log T \log n})$ in linear contextual bandits (Li et al., 2019).

4. Sequential Probability Assignment and Entropic Lower Bounds

Minimax regret lower bounds in sequential probability assignment under log-loss are governed by the Shtarkov sum, which calculates the “compression redundancy,” and by geometric entropy measures:

$R_n(Q) = \log \sum_{y \in Y^n} \sup_{q \in Q} q(y).$

A core result is that for function classes with sequential square-root covering entropy $H(Q, \alpha, n)$ scaling as $\tilde{O}(\alpha^{-p})$ ,

$R_n(Q) = \tilde{\Omega}(n^{p/(p+2)})$

for $p \leq 2$ , with different rates above this threshold (Jia et al., 22 Mar 2025).

This formalism unifies minimax regret with geometric covering approaches—large entropy (richness) in the square-root metric (Hellinger sense) leads directly to higher unavoidable regret.

5. Lower Bound Techniques in Structured and Realizable Settings

In supervised and transductive classification, sample complexity and minimax risk lower bounds are derived using VC dimension arguments, shattering combinatorics, and binomial approximations. For a hypothesis class of VC-dimension $d$ , the minimax sample for $\epsilon$ -error with probability at least $1 - \delta$ is

$m \geq \Omega\left( \frac{d}{\epsilon} + \frac{\log(1/\delta)}{\epsilon} \right),$

which matches the inductive (i.i.d.) supervised sample complexity (Tolstikhin et al., 2016). Crucially, the availability of unlabeled data in transduction or semi-supervised learning does not improve the minimax rate in the worst case.

6. Lower Bounds in Constrained and Non-Stationary Online Settings

Switching-constrained OCO (limited number K of allowed switches over T rounds) admits a tight minimax lower bound of order

$\Theta\left(\frac{T}{\sqrt{K}}\right),$

with a precise lower bound $T/\sqrt{2K}$ in one dimension, established via "fugal game" relaxation (Chen et al., 2019). This absence of phase transition stands in contrast to the discrete prediction-from-experts case.

In reinforcement learning for non-stationary, finite-horizon MDPs with H stages, S states, and A actions, new constructions yield a regret lower bound

$\Omega\left(\sqrt{H^3 S A T}\right),$

reflecting the increased exploration difficulty when the transition kernel varies across stages (Domingues et al., 2020).

7. Unifying Features and Implications

Minimax regret lower bounds depend on the geometry of the action or hypothesis space (captured by convexity, Lipschitzness, covering/packing numbers, and Bregman/jensen-divergence).
Information-theoretic quantities ( $f$ -divergences, metric entropy, Shtarkov sum) are both necessary and sufficient for characterizing minimax rates in estimation, online learning, and sequential prediction.
Rate-optimal algorithms must fundamentally adapt to the statistical complexity dictated by corresponding lower bounds—no learning rule can uniformly outperform these benchmarks absent additional problem structure.
For practical applications, these bounds quantify the best-possible trade-off between exploration and exploitation, guide optimal algorithm design, and expose the regime(s) where further progress can only come from exploiting structural assumptions beyond worst-case analysis.

Problem/Class	Minimax Regret Lower Bound	Key Techniques/Quantities
Online convex optimization (OCO)	$\Omega(\sqrt{T})$ or $\Omega(\sqrt{T\log N})$	Minimax duality, Jensen gap, Gaussian complexity
Bandits (linear/contextual)	$\Omega(\sqrt{dT \log T \log n})$	Elliptical potential, adversarial instance
Classification (VC dim. $d$ )	$\Omega(d/m)$ sample complexity	Shattering, combinatorics, Binomial tail
Probability assignment (log-loss)	$\tilde{\Omega}(n^{p/(p+2)})$ ( $p$ = entropy exponent)	Shtarkov sum, sequential covering, Hellinger
Streaming bandits ( $B$ passes)	$\Omega((TB)^\alpha K^{1-\alpha})$	Arm identification, detection, sample complexity
RL (finite MDPs, non-stationary)	$\Omega(\sqrt{H^3SAT})$	Hard MDP constructions, KL, change-of-measure

These lower bounds serve as a foundational standard in sequential learning and decision theory, shaping both theoretical investigations and the evaluation of new algorithmic proposals.