Minimax Expected Regret (MMER)
- MMER is a decision-theoretic criterion that measures the worst-case gap between adaptive online actions and the best fixed decision in hindsight.
- It leverages convex analysis and minimax duality to connect online learning performance with stochastic empirical risk minimization.
- MMER underpins robust algorithms in adversarial settings, guiding regret rate bounds for convex optimization and learning applications.
Minimax Expected Regret (MMER) is a decision-theoretic and online learning criterion that quantifies the worst-case performance gap between an adaptive sequence of actions and the best fixed decision in hindsight, under adversarial or uncertain conditions. Unlike classical expected loss minimization, MMER robustly measures a learner’s or decision-maker’s vulnerability to potentially adversarial sequences, providing guarantees relative to an optimal fixed strategy chosen with full knowledge of the observed data.
1. Formal Definition and Duality Foundations
Consider the standard online convex optimization (OCO) game: at each round , a player selects a decision and the adversary selects from some set . Losses are incurred via a convex function . The (instantaneous) regret at step is the difference between the loss incurred and that of the best fixed strategy in hindsight.
The minimax expected regret after rounds is defined via the nested min–max game:
However, as shown via minimax duality (0903.5328), this minimax regret can equivalently be expressed in stochastic terms:
where the supremum is over all (possibly adversarial) joint distributions on . This expression reveals that is precisely the worst-case expected gap between online (adaptively conditional) and batch (in-hindsight) minimization.
A pivotal role is played by the concave functional
leading to the formulation:
where is the conditional distribution of given the past, and is the empirical distribution over the -step sequence.
2. Geometric and Information-Theoretic Interpretations
The MMER has a natural geometric interpretation: it is the "gap in Jensen’s inequality" for the concave functional evaluated on the sequence of conditionals and their empirical average. Specifically, the dual formulation shows
with equality only for linear . Thus, MMER quantifies the curvature of , which itself reflects the richness and structure of the loss class or hypothesis space.
Geometrically, is the support function of the convex hull of the loss vectors:
making a "mirroring" of the function class into a concave potential defined over distributions.
The size and shape of this gap govern regret rates:
- If is flat (e.g., when the loss is strongly convex or exp-concave), .
- If is non-differentiable (corresponding to multiple minimizers or faces in the loss set), then and generally optimal regret matches the known lower bounds in online learning (0903.5328).
3. MMER and Empirical Risk Minimization
The equivalence between minimax regret and stochastic empirical minimization clarifies the connection to learning theory. At each round, the player's conditional minimizer minimizes , while the batch minimizer minimizes total observed loss. Thus,
Minimax duality ensures that the adversary's "worst-case" can be achieved by randomization, showing MMER as a bridge between adversarial and stochastic models.
For example, in online convex optimization, this duality gives explicit MMER upper and lower bounds without requiring an explicit construction of an online learning algorithm.
4. Regret Rates and Structural Conditions
The behavior and rate of growth of MMER are tightly linked to structural properties of the loss:
- For -strongly convex and -Lipschitz losses:
- For concave but non-smooth , MMER is at least .
- The shift from logarithmic to sublinear rates can be understood as stemming from the transition from "flat" to "non-smooth" functionals .
The geometry of the loss class (e.g., whether is strictly convex or has exposed faces) determines the ease with which the sequence of empirical conditionals can deviate from their average, and thereby the attainable regret rate.
5. Generalizations and Decision-Theoretic Variants
Related constructs in decision theory extend MMER to more general settings:
- Minimax weighted expected regret (MWER): Each distribution over uncertainty receives a weight, extending the standard (unweighted) MMER. Weighted approaches enable finer modeling of confidence and ambiguity in the agent’s beliefs, and their updating mechanism (likelihood updates of weights) yields convergence to classical expected utility under repeated evidence (1210.4853, 1302.5681).
- Partial monitoring and bandits: MMER is fundamental for deriving problem-specific lower and upper bounds in adversarial bandit problems and more general partial monitoring frameworks. Clean information-theoretic minimax theorems and sharp rates are available, often with matching constants (1902.00470, 2202.10997).
- Distributionally robust optimization: In robust planning and Markov Decision Processes, the minimax regret approach achieves robust performance guarantees relative to the optimal policy in each possible realization, balancing conservatism with performance (2012.04626, 2410.16013).
6. MMER in Learning, Optimization, and Applications
MMER underpins a variety of results across learning theory:
- In statistical learning, for function classes of moderate complexity ( for entropy growth), the minimax regret matches minimax risk, while for massive classes (), regret rates are necessarily slower (1308.1147).
- In combinatorial optimization under uncertainty, introducing randomization in the decision maker’s strategy reduces conservatism and can make the MMER tractable by LP methods (1401.7043).
- In nonstationary bandit problems, MMER quantifies optimal "adaptivity to change," guiding architecture and window size in adaptive-UCB algorithms (2101.08980).
The deep connection of MMER to information theory, geometry, and convex analysis makes it a central organizing principle for the design and analysis of robust, adaptive, and learning-centered algorithms in adversarial and stochastic environments.
7. Summary Table: MMER—Key Constructs and Implications
Aspect | MMER Characterization | Implication |
---|---|---|
Formal Definition | Links regret to distributional curvature | |
Geometry of | Flat: strongly convex/exp-concave ; non-smooth | Structure governs attainable rates |
Empirical Min vs. Conditional Min | Regret is the gap between online adaptive and best-in-hindsight loss | Duality with empirical risk minimization |
Stochastic vs. Adversarial Model | Minimax over adversarial distributions is equivalent to stochastic empirical process | Unifies two key learning paradigms |
Decision-Theoretic Variants | MWER, robust planning, partial monitoring | Flexible generalization |
Applications | Online convex optimization, statistical learning, bandits, robust planning | Adversarial robustness, adaptivity |
MMER thus provides both the theoretical ceiling for online learning and decision-making algorithms and the underlying conceptual structure for robust adaptive behavior in uncertain environments.