Minimax Regret Lower Bounds
- Minimax regret lower bounds are a formal measure that defines the smallest achievable excess loss in sequential and statistical decision-making.
- They unify geometric, duality-based, and information-theoretic approaches to benchmark performance in settings like online convex optimization and bandit learning.
- Applications include optimal algorithm design in online learning, reinforcement learning, and prediction tasks, with implications drawn from covering numbers and f-divergence analyses.
Minimax regret lower bounds formalize the worst-case performance of decision-making and estimation strategies in sequential and statistical learning, quantitatively specifying the smallest achievable excess loss (regret) relative to a reference class (e.g., best fixed policy, parameter, or expert) under adversarial or stochastic models. These lower bounds constitute a fundamental benchmark for online convex optimization, bandit algorithms, online regression, sequential probability assignment, and general nonparametric estimation, unifying information-theoretic, geometric, and algorithmic perspectives.
1. Geometric and Duality-Based Characterizations
A central approach to minimax regret lower bounds in online convex optimization (OCO) uses minimax duality, connecting the adversarial and stochastic viewpoints. The minimax regret over T rounds is defined by the game value
Von Neumann's minimax theorem allows interchange of infima and suprema, yielding the dual representation
Defining the concave functional
and the empirical measure , the regret can be compactly written as
This duality shows that regret can be viewed as the “Jensen gap” for the concave , i.e.,
The gap is large when is non-differentiable (e.g., the loss class has exposed faces), signifying intrinsic statistical difficulty and leading to lower bounds such as in general or for expert advice with N actions (0903.5328).
2. Information-Theoretic Lower Bounds via -Divergences
For general estimation and prediction problems, minimax risk (or minimax regret) lower bounds are universally captured by reductions to hypothesis testing and the use of -divergences. For a parameter set with prior on a finite subset , the Bayes risk is
and for a convex , the central inequality (see Theorem 2.1 in (Guntuboyina, 2010)) is
where is the -divergence, and depends on a maximization over the posterior.
Specializations yield Fano’s inequality (using the KL-divergence), Pinsker’s inequality, and risk lower bounds in terms of total variation and global metric entropy. These bounds show that the minimax risk or regret is dictated by the “packing” of the model space under the chosen divergence, relating tightly to estimation tasks involving convex bodies or covariance matrices (Guntuboyina, 2010).
3. Lower Bounds in Online and Bandit Learning
In adversarial online learning and bandit settings, minimax regret lower bounds are obtained by constructing “hard” instances using combinatorial designs and information-theoretic arguments (KL divergence, Pinsker's inequality). For combinatorial prediction with binary vectors under or adversaries, worst-case regret lower bounds are
- Full-information: (), (),
- Semi-bandit: similar orders (possibly with extra factors),
- Bandit: for well-chosen combinatorial sets, .
For classical online experts, non-asymptotic lower bounds are derived from Gaussian and random walk maxima. For d experts and n rounds (Orabona et al., 2015),
so any algorithm suffers at least regret, recovering the correct rate.
In high-dimensional bandits with structured action sets, minimax lower bounds inherit extra logarithmic dependencies, e.g., in linear contextual bandits (Li et al., 2019).
4. Sequential Probability Assignment and Entropic Lower Bounds
Minimax regret lower bounds in sequential probability assignment under log-loss are governed by the Shtarkov sum, which calculates the “compression redundancy,” and by geometric entropy measures:
A core result is that for function classes with sequential square-root covering entropy scaling as ,
for , with different rates above this threshold (Jia et al., 22 Mar 2025).
This formalism unifies minimax regret with geometric covering approaches—large entropy (richness) in the square-root metric (Hellinger sense) leads directly to higher unavoidable regret.
5. Lower Bound Techniques in Structured and Realizable Settings
In supervised and transductive classification, sample complexity and minimax risk lower bounds are derived using VC dimension arguments, shattering combinatorics, and binomial approximations. For a hypothesis class of VC-dimension , the minimax sample for -error with probability at least is
which matches the inductive (i.i.d.) supervised sample complexity (Tolstikhin et al., 2016). Crucially, the availability of unlabeled data in transduction or semi-supervised learning does not improve the minimax rate in the worst case.
6. Lower Bounds in Constrained and Non-Stationary Online Settings
Switching-constrained OCO (limited number K of allowed switches over T rounds) admits a tight minimax lower bound of order
with a precise lower bound in one dimension, established via "fugal game" relaxation (Chen et al., 2019). This absence of phase transition stands in contrast to the discrete prediction-from-experts case.
In reinforcement learning for non-stationary, finite-horizon MDPs with H stages, S states, and A actions, new constructions yield a regret lower bound
reflecting the increased exploration difficulty when the transition kernel varies across stages (Domingues et al., 2020).
7. Unifying Features and Implications
- Minimax regret lower bounds depend on the geometry of the action or hypothesis space (captured by convexity, Lipschitzness, covering/packing numbers, and Bregman/jensen-divergence).
- Information-theoretic quantities (-divergences, metric entropy, Shtarkov sum) are both necessary and sufficient for characterizing minimax rates in estimation, online learning, and sequential prediction.
- Rate-optimal algorithms must fundamentally adapt to the statistical complexity dictated by corresponding lower bounds—no learning rule can uniformly outperform these benchmarks absent additional problem structure.
- For practical applications, these bounds quantify the best-possible trade-off between exploration and exploitation, guide optimal algorithm design, and expose the regime(s) where further progress can only come from exploiting structural assumptions beyond worst-case analysis.
Problem/Class | Minimax Regret Lower Bound | Key Techniques/Quantities |
---|---|---|
Online convex optimization (OCO) | or | Minimax duality, Jensen gap, Gaussian complexity |
Bandits (linear/contextual) | Elliptical potential, adversarial instance | |
Classification (VC dim. ) | sample complexity | Shattering, combinatorics, Binomial tail |
Probability assignment (log-loss) | ( = entropy exponent) | Shtarkov sum, sequential covering, Hellinger |
Streaming bandits ( passes) | Arm identification, detection, sample complexity | |
RL (finite MDPs, non-stationary) | Hard MDP constructions, KL, change-of-measure |
These lower bounds serve as a foundational standard in sequential learning and decision theory, shaping both theoretical investigations and the evaluation of new algorithmic proposals.