Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Minimax Expected Regret (MMER)

Updated 8 July 2025
  • MMER is a decision-theoretic criterion that measures the worst-case gap between adaptive online actions and the best fixed decision in hindsight.
  • It leverages convex analysis and minimax duality to connect online learning performance with stochastic empirical risk minimization.
  • MMER underpins robust algorithms in adversarial settings, guiding regret rate bounds for convex optimization and learning applications.

Minimax Expected Regret (MMER) is a decision-theoretic and online learning criterion that quantifies the worst-case performance gap between an adaptive sequence of actions and the best fixed decision in hindsight, under adversarial or uncertain conditions. Unlike classical expected loss minimization, MMER robustly measures a learner’s or decision-maker’s vulnerability to potentially adversarial sequences, providing guarantees relative to an optimal fixed strategy chosen with full knowledge of the observed data.

1. Formal Definition and Duality Foundations

Consider the standard online convex optimization (OCO) game: at each round t=1,,Tt = 1, \ldots, T, a player selects a decision ftFf_t \in \mathcal{F} and the adversary selects ztz_t from some set Z\mathcal{Z}. Losses are incurred via a convex function (zt,ft)\ell(z_t, f_t). The (instantaneous) regret at step tt is the difference between the loss incurred and that of the best fixed strategy in hindsight.

The minimax expected regret after TT rounds is defined via the nested min–max game:

RT=inff1:Tsupz1:T{t=1T(zt,ft)inffFt=1T(zt,f)}R^*_T = \inf_{f_{1:T}} \sup_{z_{1:T}} \left\{\sum_{t=1}^T \ell(z_t, f_t) - \inf_{f \in \mathcal{F}} \sum_{t=1}^T \ell(z_t, f)\right\}

However, as shown via minimax duality (0903.5328), this minimax regret can equivalently be expressed in stochastic terms:

RT=supp[t=1TinfftFE[(Zt,ft)Z1,,Zt1]inffFt=1T(Zt,f)]R^*_T = \sup_{p} \left[\sum_{t=1}^T \inf_{f_t \in \mathcal{F}} \mathbb{E}\left[\ell(Z_t, f_t) \mid Z_1, \ldots, Z_{t-1}\right] - \inf_{f \in \mathcal{F}} \sum_{t=1}^T \ell(Z_t, f)\right]

where the supremum is over all (possibly adversarial) joint distributions pp on (Z1,,ZT)(Z_1,\ldots,Z_T). This expression reveals that RTR^*_T is precisely the worst-case expected gap between online (adaptively conditional) and batch (in-hindsight) minimization.

A pivotal role is played by the concave functional

Φ(p)=inffFEZp[(Z,f)],\Phi(p) = \inf_{f \in \mathcal{F}} \mathbb{E}_{Z \sim p}[\ell(Z, f)],

leading to the formulation:

RT=supp[t=1TΦ(pt)Φ(UT)]R^*_T = \sup_{p} \left[ \sum_{t=1}^T \Phi(p_t) - \Phi(U_T) \right]

where ptp_t is the conditional distribution of ZtZ_t given the past, and UTU_T is the empirical distribution over the TT-step sequence.

2. Geometric and Information-Theoretic Interpretations

The MMER has a natural geometric interpretation: it is the "gap in Jensen’s inequality" for the concave functional Φ\Phi evaluated on the sequence of conditionals and their empirical average. Specifically, the dual formulation shows

Φ(UT)1Tt=1TΦ(pt),\Phi(U_T) \leq \frac{1}{T}\sum_{t=1}^T \Phi(p_t),

with equality only for linear Φ\Phi. Thus, MMER quantifies the curvature of Φ\Phi, which itself reflects the richness and structure of the loss class or hypothesis space.

Geometrically, Φ-\Phi is the support function of the convex hull of the loss vectors:

Φ(p)=σ(F)(p)=supfco[(F)]f,p,-\Phi(p) = \sigma_{-\ell(\mathcal{F})}(p) = \sup_{-\ell_f \in \mathrm{co}[-\ell(\mathcal{F})]} \langle-\ell_f, p\rangle,

making Φ\Phi a "mirroring" of the function class into a concave potential defined over distributions.

The size and shape of this gap govern regret rates:

  • If Φ\Phi is flat (e.g., when the loss is strongly convex or exp-concave), RT=O(logT)R^*_T = O(\log T).
  • If Φ\Phi is non-differentiable (corresponding to multiple minimizers or faces in the loss set), then RT=Ω(T)R^*_T = \Omega(\sqrt{T}) and generally optimal regret matches the known T\sqrt{T} lower bounds in online learning (0903.5328).

3. MMER and Empirical Risk Minimization

The equivalence between minimax regret and stochastic empirical minimization clarifies the connection to learning theory. At each round, the player's conditional minimizer ftf_t minimizes E[(Zt,ft)Z1,,Zt1]\mathbb{E}[\ell(Z_t, f_t)\,|\,Z_1,\ldots,Z_{t-1}], while the batch minimizer f^\hat{f} minimizes total observed loss. Thus,

RT(player)=t=1T(Zt,ft)t=1T(Zt,f^).R_T(\text{player}) = \sum_{t=1}^T \ell(Z_t, f_t) - \sum_{t=1}^T \ell(Z_t, \hat{f}).

Minimax duality ensures that the adversary's "worst-case" can be achieved by randomization, showing MMER as a bridge between adversarial and stochastic models.

For example, in online convex optimization, this duality gives explicit MMER upper and lower bounds without requiring an explicit construction of an online learning algorithm.

4. Regret Rates and Structural Conditions

The behavior and rate of growth of MMER are tightly linked to structural properties of the loss:

  • For σ\sigma-strongly convex and LL-Lipschitz losses:

RT8L2σlogTR^*_T \leq \frac{8 L^2}{\sigma} \log T

  • For concave but non-smooth Φ\Phi, MMER is at least Ω(T)\Omega(\sqrt{T}).
  • The shift from logarithmic to sublinear rates can be understood as stemming from the transition from "flat" to "non-smooth" functionals Φ\Phi.

The geometry of the loss class (e.g., whether co[(F)]\mathrm{co}[-\ell(\mathcal{F})] is strictly convex or has exposed faces) determines the ease with which the sequence of empirical conditionals can deviate from their average, and thereby the attainable regret rate.

5. Generalizations and Decision-Theoretic Variants

Related constructs in decision theory extend MMER to more general settings:

  • Minimax weighted expected regret (MWER): Each distribution over uncertainty receives a weight, extending the standard (unweighted) MMER. Weighted approaches enable finer modeling of confidence and ambiguity in the agent’s beliefs, and their updating mechanism (likelihood updates of weights) yields convergence to classical expected utility under repeated evidence (1210.4853, 1302.5681).
  • Partial monitoring and bandits: MMER is fundamental for deriving problem-specific lower and upper bounds in adversarial bandit problems and more general partial monitoring frameworks. Clean information-theoretic minimax theorems and sharp rates are available, often with matching constants (1902.00470, 2202.10997).
  • Distributionally robust optimization: In robust planning and Markov Decision Processes, the minimax regret approach achieves robust performance guarantees relative to the optimal policy in each possible realization, balancing conservatism with performance (2012.04626, 2410.16013).

6. MMER in Learning, Optimization, and Applications

MMER underpins a variety of results across learning theory:

  • In statistical learning, for function classes of moderate complexity (p(0,2)p \in (0,2) for entropy growth), the minimax regret matches minimax risk, while for massive classes (p>2p>2), regret rates are necessarily slower (1308.1147).
  • In combinatorial optimization under uncertainty, introducing randomization in the decision maker’s strategy reduces conservatism and can make the MMER tractable by LP methods (1401.7043).
  • In nonstationary bandit problems, MMER quantifies optimal "adaptivity to change," guiding architecture and window size in adaptive-UCB algorithms (2101.08980).

The deep connection of MMER to information theory, geometry, and convex analysis makes it a central organizing principle for the design and analysis of robust, adaptive, and learning-centered algorithms in adversarial and stochastic environments.

7. Summary Table: MMER—Key Constructs and Implications

Aspect MMER Characterization Implication
Formal Definition RT=supp[tΦ(pt)Φ(UT)]R^*_T = \sup_p \left[\sum_t \Phi(p_t) - \Phi(U_T)\right] Links regret to distributional curvature
Geometry of Φ\Phi Flat: strongly convex/exp-concave     O(logT)\implies O(\log T); non-smooth     Ω(T)\implies \Omega(\sqrt{T}) Structure governs attainable rates
Empirical Min vs. Conditional Min Regret is the gap between online adaptive and best-in-hindsight loss Duality with empirical risk minimization
Stochastic vs. Adversarial Model Minimax over adversarial distributions is equivalent to stochastic empirical process Unifies two key learning paradigms
Decision-Theoretic Variants MWER, robust planning, partial monitoring Flexible generalization
Applications Online convex optimization, statistical learning, bandits, robust planning Adversarial robustness, adaptivity

MMER thus provides both the theoretical ceiling for online learning and decision-making algorithms and the underlying conceptual structure for robust adaptive behavior in uncertain environments.