Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

134 tokens/sec

GPT-4o

10 tokens/sec

Gemini 2.5 Pro Pro

47 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

Minimax Expected Regret (MMER)

Updated 8 July 2025

MMER is a decision-theoretic criterion that measures the worst-case gap between adaptive online actions and the best fixed decision in hindsight.
It leverages convex analysis and minimax duality to connect online learning performance with stochastic empirical risk minimization.
MMER underpins robust algorithms in adversarial settings, guiding regret rate bounds for convex optimization and learning applications.

Minimax Expected Regret (MMER) is a decision-theoretic and online learning criterion that quantifies the worst-case performance gap between an adaptive sequence of actions and the best fixed decision in hindsight, under adversarial or uncertain conditions. Unlike classical expected loss minimization, MMER robustly measures a learner’s or decision-maker’s vulnerability to potentially adversarial sequences, providing guarantees relative to an optimal fixed strategy chosen with full knowledge of the observed data.

1. Formal Definition and Duality Foundations

Consider the standard online convex optimization (OCO) game: at each round $t = 1, \ldots, T$ , a player selects a decision $f_t \in \mathcal{F}$ and the adversary selects $z_t$ from some set $\mathcal{Z}$ . Losses are incurred via a convex function $\ell(z_t, f_t)$ . The (instantaneous) regret at step $t$ is the difference between the loss incurred and that of the best fixed strategy in hindsight.

The minimax expected regret after $T$ rounds is defined via the nested min–max game:

$R^*_T = \inf_{f_{1:T}} \sup_{z_{1:T}} \left\{\sum_{t=1}^T \ell(z_t, f_t) - \inf_{f \in \mathcal{F}} \sum_{t=1}^T \ell(z_t, f)\right\}$

However, as shown via minimax duality (0903.5328), this minimax regret can equivalently be expressed in stochastic terms:

$R^*_T = \sup_{p} \left[\sum_{t=1}^T \inf_{f_t \in \mathcal{F}} \mathbb{E}\left[\ell(Z_t, f_t) \mid Z_1, \ldots, Z_{t-1}\right] - \inf_{f \in \mathcal{F}} \sum_{t=1}^T \ell(Z_t, f)\right]$

where the supremum is over all (possibly adversarial) joint distributions $p$ on $(Z_1,\ldots,Z_T)$ . This expression reveals that $R^*_T$ is precisely the worst-case expected gap between online (adaptively conditional) and batch (in-hindsight) minimization.

A pivotal role is played by the concave functional

$\Phi(p) = \inf_{f \in \mathcal{F}} \mathbb{E}_{Z \sim p}[\ell(Z, f)],$

leading to the formulation:

$R^*_T = \sup_{p} \left[ \sum_{t=1}^T \Phi(p_t) - \Phi(U_T) \right]$

where $p_t$ is the conditional distribution of $Z_t$ given the past, and $U_T$ is the empirical distribution over the $T$ -step sequence.

2. Geometric and Information-Theoretic Interpretations

The MMER has a natural geometric interpretation: it is the "gap in Jensen’s inequality" for the concave functional $\Phi$ evaluated on the sequence of conditionals and their empirical average. Specifically, the dual formulation shows

$\Phi(U_T) \leq \frac{1}{T}\sum_{t=1}^T \Phi(p_t),$

with equality only for linear $\Phi$ . Thus, MMER quantifies the curvature of $\Phi$ , which itself reflects the richness and structure of the loss class or hypothesis space.

Geometrically, $-\Phi$ is the support function of the convex hull of the loss vectors:

$-\Phi(p) = \sigma_{-\ell(\mathcal{F})}(p) = \sup_{-\ell_f \in \mathrm{co}[-\ell(\mathcal{F})]} \langle-\ell_f, p\rangle,$

making $\Phi$ a "mirroring" of the function class into a concave potential defined over distributions.

The size and shape of this gap govern regret rates:

If $\Phi$ is flat (e.g., when the loss is strongly convex or exp-concave), $R^*_T = O(\log T)$ .
If $\Phi$ is non-differentiable (corresponding to multiple minimizers or faces in the loss set), then $R^*_T = \Omega(\sqrt{T})$ and generally optimal regret matches the known $\sqrt{T}$ lower bounds in online learning (0903.5328).

3. MMER and Empirical Risk Minimization

The equivalence between minimax regret and stochastic empirical minimization clarifies the connection to learning theory. At each round, the player's conditional minimizer $f_t$ minimizes $\mathbb{E}[\ell(Z_t, f_t)\,|\,Z_1,\ldots,Z_{t-1}]$ , while the batch minimizer $\hat{f}$ minimizes total observed loss. Thus,

$R_T(\text{player}) = \sum_{t=1}^T \ell(Z_t, f_t) - \sum_{t=1}^T \ell(Z_t, \hat{f}).$

Minimax duality ensures that the adversary's "worst-case" can be achieved by randomization, showing MMER as a bridge between adversarial and stochastic models.

For example, in online convex optimization, this duality gives explicit MMER upper and lower bounds without requiring an explicit construction of an online learning algorithm.

4. Regret Rates and Structural Conditions

The behavior and rate of growth of MMER are tightly linked to structural properties of the loss:

For $\sigma$ -strongly convex and $L$ -Lipschitz losses:

$R^*_T \leq \frac{8 L^2}{\sigma} \log T$

For concave but non-smooth $\Phi$ , MMER is at least $\Omega(\sqrt{T})$ .
The shift from logarithmic to sublinear rates can be understood as stemming from the transition from "flat" to "non-smooth" functionals $\Phi$ .

The geometry of the loss class (e.g., whether $\mathrm{co}[-\ell(\mathcal{F})]$ is strictly convex or has exposed faces) determines the ease with which the sequence of empirical conditionals can deviate from their average, and thereby the attainable regret rate.

5. Generalizations and Decision-Theoretic Variants

Related constructs in decision theory extend MMER to more general settings:

Minimax weighted expected regret (MWER): Each distribution over uncertainty receives a weight, extending the standard (unweighted) MMER. Weighted approaches enable finer modeling of confidence and ambiguity in the agent’s beliefs, and their updating mechanism (likelihood updates of weights) yields convergence to classical expected utility under repeated evidence (1210.4853, 1302.5681).
Partial monitoring and bandits: MMER is fundamental for deriving problem-specific lower and upper bounds in adversarial bandit problems and more general partial monitoring frameworks. Clean information-theoretic minimax theorems and sharp rates are available, often with matching constants (1902.00470, 2202.10997).
Distributionally robust optimization: In robust planning and Markov Decision Processes, the minimax regret approach achieves robust performance guarantees relative to the optimal policy in each possible realization, balancing conservatism with performance (2012.04626, 2410.16013).

6. MMER in Learning, Optimization, and Applications

MMER underpins a variety of results across learning theory:

In statistical learning, for function classes of moderate complexity ( $p \in (0,2)$ for entropy growth), the minimax regret matches minimax risk, while for massive classes ( $p>2$ ), regret rates are necessarily slower (1308.1147).
In combinatorial optimization under uncertainty, introducing randomization in the decision maker’s strategy reduces conservatism and can make the MMER tractable by LP methods (1401.7043).
In nonstationary bandit problems, MMER quantifies optimal "adaptivity to change," guiding architecture and window size in adaptive-UCB algorithms (2101.08980).

The deep connection of MMER to information theory, geometry, and convex analysis makes it a central organizing principle for the design and analysis of robust, adaptive, and learning-centered algorithms in adversarial and stochastic environments.

7. Summary Table: MMER—Key Constructs and Implications

Aspect	MMER Characterization	Implication
Formal Definition	$R^*_T = \sup_p \left[\sum_t \Phi(p_t) - \Phi(U_T)\right]$	Links regret to distributional curvature
Geometry of $\Phi$	Flat: strongly convex/exp-concave $\implies O(\log T)$ ; non-smooth $\implies \Omega(\sqrt{T})$	Structure governs attainable rates
Empirical Min vs. Conditional Min	Regret is the gap between online adaptive and best-in-hindsight loss	Duality with empirical risk minimization
Stochastic vs. Adversarial Model	Minimax over adversarial distributions is equivalent to stochastic empirical process	Unifies two key learning paradigms
Decision-Theoretic Variants	MWER, robust planning, partial monitoring	Flexible generalization
Applications	Online convex optimization, statistical learning, bandits, robust planning	Adversarial robustness, adaptivity

MMER thus provides both the theoretical ceiling for online learning and decision-making algorithms and the underlying conceptual structure for robust adaptive behavior in uncertain environments.