Minimax Learning Formulation

Updated 12 January 2026

Minimax Learning Formulation is a framework that optimizes the worst-case risk to ensure robustness against adversarial settings.
It is applied in online learning, multi-objective optimization, supervised, reinforcement, and meta-learning, providing unifying algorithmic strategies.
The approach leverages game-theoretic and saddle-point dualities to yield rigorous guarantees, scalable algorithms, and practical performance bounds.

A minimax learning formulation specifies a learning criterion, algorithm, or statistical bound in terms of the worst-case (maximal) value of a risk, loss, regret, or error function over an adversarially chosen environment, task, or data distribution. This formulation is fundamental in online learning, multi-objective optimization, supervised and reinforcement learning, robust estimation, meta-learning, and distributional robustness. The minimax perspective provides both rigorous guarantees (often via game-theoretic or saddle-point dualities) and unifying algorithmic principles for confronting non-stochasticity, adversarial scenarios, multiple objectives, or distributional shift.

1. Foundational Concepts and Formulations

The central principle underlying minimax learning is the optimization of a learner's strategy (e.g., parameters, policies, predictors) to control the worst-case value of a performance metric against a set of adversarial choices (nature, adversary, data distribution, task, or loss coordinate). The canonical settings admit the following general mold:

Supervised Learning: Minimize the worst-case expected loss over an ambiguity set $\Gamma$ of data distributions,

$\psi^* = \arg\min_{\psi\in\Psi} \max_{P\in\Gamma} \mathbb{E}_{P}[L(Y, \psi(X))]$

as in the maximum-conditional-entropy principle (Farnia et al., 2016).

Multi-objective Online Learning: The learner and adversary play a vector-valued game over rounds $t=1,\dots,T$ ; the learner's AMF-regret is compared to the benchmark where the adversary must move first, bypassing the classic minimax theorem due to nonconvexity:

$R^T = \max_{j \in [d]} \left[ \sum_{t=1}^T \ell^t_j(x^t, y^t) - \sum_{t=1}^T w^t_A \right]$

where $w^t_A = \sup_{y\in Y^t} \min_{x\in X^t} \max_{j\in[d]} \ell^t_j(x, y)$ (Lee et al., 2021).

Multi-task Learning (MTL): Replace the mean task risk minimization by a max-risk (worst task) objective:

$\min_{\theta} \max_{i=1,\dots,T} \hat R_i(\theta)$

embedded in a generalized loss-compositional paradigm (Mehta et al., 2012).

Reinforcement Learning / Games: For Markov games and adversarial RL, the value (e.g., discounted sum of rewards) to a player is defined as a saddle value over policy classes:

$V^*(s) = \min_{\pi_2} \max_{\pi_1} V^{\pi_1, \pi_2}(s)$

which leads to Q-learning algorithms and convergence theorems for the minimax solution (Diddigi et al., 2019).

Bilevel Problems: Under strong convexity, bilevel programs can be exactly recast as minimax saddle problems,

$\min_{x} \max_{y} \left\{ F(x, y) - \frac{\gamma}{2} \|\nabla_y f(x, y)\|^2 \right\}$

(Wang et al., 2023).

Minimax rates, risk bounds, and learning algorithms are then analyzed in terms of these worst-case objectives.

2. Principal Theoretical Guarantees and Algorithmic Frameworks

Online Minimax Multiobjective Optimization

Lee et al. (Lee et al., 2021) provide a generic online minimax multiobjective algorithm where, at each round $t$ , the learner aggregates the $d$ -vector losses via a "softmax" assignment,

$\chi^t_j = \frac{ \exp(\eta S_j^{t-1}) }{ \sum_{i=1}^d \exp(\eta S_i^{t-1}) }$

and plays minimax-optimal $x^t$ for the surrogate convex-concave payoff $u^t(x, y) = \sum_{j=1}^d \chi^t_j \ell^t_j(x, y)$ . AMF-regret satisfies $R^T \le 4C\sqrt{T \ln d}$ for bounded losses, simultaneously yielding optimal rates for external, internal, swap, interval, and multi-group regrets as well as Blackwell approachability.

Loss-Compositional and Minimax Multi-task Learning

Mehta et al. (Mehta et al., 2012) generalize multi-task learning to minimax and intermediate $\ell_p$ -risk objectives. Their main LTL theorem (sketched) implies that controlling the max empirical risk on $T$ training tasks exponentially reduces the probability that a fresh test task will have high excess risk, providing justification for minimax MTL when worst-case generalization is required.

Convex-linear Minimax via Bandit Strategies

Roux et al. (Roux et al., 2021) present scalable algorithms for minimax convex-linear learning with adversarial distributional aggregation: $\min_{w \in \mathcal{W}} \max_{p \in \mathcal{K}} \langle L(w), p \rangle$ with $\mathcal{K} \subseteq \mathcal{S}_n$ representing the adversary's set (e.g., capped-sparsity simplex for max-loss, top- $k$ loss, or DRO). Online full-information OCO and combinatorial bandit approaches are combined to obtain $O(1/\sqrt{T})$ primal-dual gaps with per-round computation scaling only with the support size $k$ of the extreme points of $\mathcal{K}$ .

Minimax Formulations in Stochastic/Adversarial RL

In zero-sum Markov games and minimax Q-learning, the minimax value function is characterized via saddle-point Bellman equations, and convergence-accelerated stochastic approximation schemes are provably faster by leveraging contraction properties of relaxed operators (Diddigi et al., 2019, Wainwright, 2019). Variance-reduced Q-learning matches the minimax sample complexity lower bound $O(D/\epsilon^2 (1-\gamma)^3)$ up to logarithmic factors (Wainwright, 2019).

Robust, Distributional, and Regret-based Learning

Minimax regret optimization (MRO) replaces worst-case risk minimization (DRO) with

$\min_{h \in \mathcal{H}} \sup_{w \in \mathcal{W}} \{ R_w(h) - R_w(h_w) \}$

yielding uniformly small excess loss across all plausible test distributions in $\mathcal{W}$ , rather than overfitting the most adversarial instance (Agarwal et al., 2022).

The minimax supervised learning framework (Farnia et al., 2016) derives the Bayes-optimal predictor via a maximum-conditional-entropy principle and solves for regularized convex GLM estimators with strong generalization guarantees.

3. Representative Applications and Domains

Domain	Minimax Objective/Formulation	Source
Online multi-objective learning	Coordinate AMF-regret, softmax-weighted convex-concave surrogate games	(Lee et al., 2021)
Multi-task learning	Max-of-task-risk (or $\ell_p$ -risk) objective over empirical task vector	(Mehta et al., 2012)
Structured prediction	Minimax excess risk over factor-graph hypothesis class	(Bello et al., 2019)
Bilevel/Meta-learning	$\min_x \max_y \{F(x, y) - \frac{\gamma}{2}\\|\nabla_y f(x, y)\\|^2\}$	(Wang et al., 2023)
Online bandit/robust loss	$\min_{w} \max_{p\in\mathcal{K}} \langle L(w), p\rangle$ for $\mathcal{K}$ simplex	(Roux et al., 2021)
Reinforcement/game learning	Markovian minimax value, zero-sum RL Bellman equations	(Diddigi et al., 2019)
Distribution shift/robustness	Minimax regret across test reweightings	(Agarwal et al., 2022)
Supervised learning (robust)	Minimax expected loss over ambiguity set $\Gamma$	(Farnia et al., 2016)

Explicit minimax learning has been used for:

Adaptive online calibration and group fairness (multicalibration, multicalibeating as a minimax groupwise calibration game).
Structured prediction: tight lower and upper bounds for minimax risk and sample complexity which depend on combinatorial (factor-graph) dimensions, rather than on $|\mathcal{Y}|$ (Bello et al., 2019).
Distribution shift and domain adaptation: robust generalization and adaptation using minimax regret metrics (Agarwal et al., 2022).
Reinforcement learning with adversarial perturbation or safety constraint: minimax DSAC and adversarial meta-RL via Stackelberg game/saddle-point methods (Ren et al., 2020, Li et al., 2022).

4. Theoretical Structure: Duality, Surrogates, and Generalizations

The classical minimax theorem requires convexity-concavity in the loss; in many learning scenarios (e.g., vector-valued non-concave max-of-losses, 0-1 loss games), this fails. Analytical surrogates—chiefly via softmax-weighted or convex-compositional aggregation—restore convexity-concavity, enabling the application of Sion's or von Neumann's theorem to guarantee existence of saddle points for surrogate games (Lee et al., 2021).
Duality theory underpins both the strong maximum-conditional-entropy result (Farnia et al., 2016) and the construction of minimax estimators from Bayes rules against least-favorable priors (Gupta et al., 2020).
Finite-support and combinatorial characterizations extend minimax duality to infinite games, especially under 0/1 losses, when the game's matrix lacks infinite triangular submatrices (Hanneke et al., 2021).
For multi-task and group-robust settings, intermediate objectives (e.g., loss-compositional via $\ell_p$ norms for $1\leq p < \infty$ or $\alpha$ -minimax relaxations) allow trading off absolute worst-case risk against fairness/robustness and average performance (Mehta et al., 2012, Lee et al., 2021).

5. Statistical Minimax Rates and Lower Bounds

In both "classical" and structured settings, minimax sample complexity for excess risk scales as $n(\epsilon) = \Theta(d/\epsilon^2)$ , where $d$ is the relevant VC or factor-graph dimension (Bello et al., 2019).
For learning kernels in operator theory, minimax rates for mean-square error depend on spectral (Sobolev) smoothness and operator decay:
- Polynomial decay: $R_M \sim M^{-2\beta r/(2\beta r+2r+1)}$
- Exponential decay: $R_M \sim M^{-\beta/(\beta+1)}$
- with matching upper/lower bounds (Zhang et al., 27 Feb 2025).
In active learning, the minimax label complexity is strictly superior to passive rates and tightly characterized by the star number and VC dimension, with detailed dependence on noise conditions (e.g., Tsybakov/Bernstein) (Hanneke et al., 2014).
For function estimation and robust regression, explicit minimax estimators can be constructed via online game playing with best-response (Bayes) oracles and follow-the-perturbed-leader adversaries, achieving risk at most $R^*$ plus an $O(1/\sqrt{T})$ regret (Gupta et al., 2020).

6. Extensions, Specializations, and Open Directions

Minimax and Regret: By recasting worst-case performance in terms of regret to an oracle (rather than risk alone), minimax formulations can avoid degeneracy in the presence of non-uniform or irreducible noise, yielding more informative guarantees and uniformly small excess risk (Agarwal et al., 2022).
Robust Shared Representation Learning: StablePCA fits a minimax convex relaxation to multi-source PCA using the Fantope, producing group-robust projections optimized for maximal error across sources. Convergence and practical certifiability are established via duality gaps and spectral criteria (Wang et al., 2 May 2025).
Computational Considerations: Efficient minimax learning depends critically on the structure of the adversarial set (e.g., extreme point sparsity in capped simplex constraints), enabling scalable online solvers for high-dimensional or combinatorial settings (Roux et al., 2021).

Ongoing research addresses minimax learning under non-convexities (bilevel, structured games, adversarial sampling with bi-level or Stackelberg structure), generalization under function approximation, and extensions to complex data modalities and distributional assumptions.

References:

"Online Minimax Multiobjective Optimization: Multicalibeating and Other Applications" (Lee et al., 2021)
"Minimax Multi-Task Learning and a Generalized Loss-Compositional Paradigm for MTL" (Mehta et al., 2012)
"Efficient Online-Bandit Strategies for Minimax Learning Problems" (Roux et al., 2021)
"A Generalized Minimax Q-learning Algorithm for Two-Player Zero-Sum Stochastic Games" (Diddigi et al., 2019)
"Variance-reduced Q-learning is minimax optimal" (Wainwright, 2019)
"A Minimax Approach to Supervised Learning" (Farnia et al., 2016)
"Minimax Regret Optimization for Robust Machine Learning under Distribution Shift" (Agarwal et al., 2022)
"Effective Bilevel Optimization via Minimax Reformulation" (Wang et al., 2023)
"StablePCA: Learning Shared Representations across Multiple Sources via Minimax Optimization" (Wang et al., 2 May 2025)
"Minimax rates for learning kernels in operators" (Zhang et al., 27 Feb 2025)
"Minimax bounds for structured prediction" (Bello et al., 2019)
"Online Learning with Simple Predictors and a Combinatorial Characterization of Minimax in 0/1 Games" (Hanneke et al., 2021)
"Towards Minimax Online Learning with Unknown Time Horizon" (Luo et al., 2013)
"Improving Generalization of Reinforcement Learning with Minimax Distributional Soft Actor-Critic" (Ren et al., 2020)
"Sampling Attacks on Meta Reinforcement Learning: A Minimax Formulation and Complexity Analysis" (Li et al., 2022)
"Learning Minimax Estimators via Online Learning" (Gupta et al., 2020)
"Minimax Analysis of Active Learning" (Hanneke et al., 2014)
"Minimax Learning for Remote Prediction" (Li et al., 2018)