Papers
Topics
Authors
Recent
2000 character limit reached

Minimax Learning Formulation

Updated 12 January 2026
  • Minimax Learning Formulation is a framework that optimizes the worst-case risk to ensure robustness against adversarial settings.
  • It is applied in online learning, multi-objective optimization, supervised, reinforcement, and meta-learning, providing unifying algorithmic strategies.
  • The approach leverages game-theoretic and saddle-point dualities to yield rigorous guarantees, scalable algorithms, and practical performance bounds.

A minimax learning formulation specifies a learning criterion, algorithm, or statistical bound in terms of the worst-case (maximal) value of a risk, loss, regret, or error function over an adversarially chosen environment, task, or data distribution. This formulation is fundamental in online learning, multi-objective optimization, supervised and reinforcement learning, robust estimation, meta-learning, and distributional robustness. The minimax perspective provides both rigorous guarantees (often via game-theoretic or saddle-point dualities) and unifying algorithmic principles for confronting non-stochasticity, adversarial scenarios, multiple objectives, or distributional shift.

1. Foundational Concepts and Formulations

The central principle underlying minimax learning is the optimization of a learner's strategy (e.g., parameters, policies, predictors) to control the worst-case value of a performance metric against a set of adversarial choices (nature, adversary, data distribution, task, or loss coordinate). The canonical settings admit the following general mold:

  • Supervised Learning: Minimize the worst-case expected loss over an ambiguity set Γ\Gamma of data distributions,

ψ=argminψΨmaxPΓEP[L(Y,ψ(X))]\psi^* = \arg\min_{\psi\in\Psi} \max_{P\in\Gamma} \mathbb{E}_{P}[L(Y, \psi(X))]

as in the maximum-conditional-entropy principle (Farnia et al., 2016).

  • Multi-objective Online Learning: The learner and adversary play a vector-valued game over rounds t=1,,Tt=1,\dots,T; the learner's AMF-regret is compared to the benchmark where the adversary must move first, bypassing the classic minimax theorem due to nonconvexity:

RT=maxj[d][t=1Tjt(xt,yt)t=1TwAt]R^T = \max_{j \in [d]} \left[ \sum_{t=1}^T \ell^t_j(x^t, y^t) - \sum_{t=1}^T w^t_A \right]

where wAt=supyYtminxXtmaxj[d]jt(x,y)w^t_A = \sup_{y\in Y^t} \min_{x\in X^t} \max_{j\in[d]} \ell^t_j(x, y) (Lee et al., 2021).

  • Multi-task Learning (MTL): Replace the mean task risk minimization by a max-risk (worst task) objective:

minθmaxi=1,,TR^i(θ)\min_{\theta} \max_{i=1,\dots,T} \hat R_i(\theta)

embedded in a generalized loss-compositional paradigm (Mehta et al., 2012).

  • Reinforcement Learning / Games: For Markov games and adversarial RL, the value (e.g., discounted sum of rewards) to a player is defined as a saddle value over policy classes:

V(s)=minπ2maxπ1Vπ1,π2(s)V^*(s) = \min_{\pi_2} \max_{\pi_1} V^{\pi_1, \pi_2}(s)

which leads to Q-learning algorithms and convergence theorems for the minimax solution (Diddigi et al., 2019).

  • Bilevel Problems: Under strong convexity, bilevel programs can be exactly recast as minimax saddle problems,

minxmaxy{F(x,y)γ2yf(x,y)2}\min_{x} \max_{y} \left\{ F(x, y) - \frac{\gamma}{2} \|\nabla_y f(x, y)\|^2 \right\}

(Wang et al., 2023).

Minimax rates, risk bounds, and learning algorithms are then analyzed in terms of these worst-case objectives.

2. Principal Theoretical Guarantees and Algorithmic Frameworks

Online Minimax Multiobjective Optimization

Lee et al. (Lee et al., 2021) provide a generic online minimax multiobjective algorithm where, at each round tt, the learner aggregates the dd-vector losses via a "softmax" assignment,

χjt=exp(ηSjt1)i=1dexp(ηSit1)\chi^t_j = \frac{ \exp(\eta S_j^{t-1}) }{ \sum_{i=1}^d \exp(\eta S_i^{t-1}) }

and plays minimax-optimal xtx^t for the surrogate convex-concave payoff ut(x,y)=j=1dχjtjt(x,y)u^t(x, y) = \sum_{j=1}^d \chi^t_j \ell^t_j(x, y). AMF-regret satisfies RT4CTlndR^T \le 4C\sqrt{T \ln d} for bounded losses, simultaneously yielding optimal rates for external, internal, swap, interval, and multi-group regrets as well as Blackwell approachability.

Loss-Compositional and Minimax Multi-task Learning

Mehta et al. (Mehta et al., 2012) generalize multi-task learning to minimax and intermediate p\ell_p-risk objectives. Their main LTL theorem (sketched) implies that controlling the max empirical risk on TT training tasks exponentially reduces the probability that a fresh test task will have high excess risk, providing justification for minimax MTL when worst-case generalization is required.

Convex-linear Minimax via Bandit Strategies

Roux et al. (Roux et al., 2021) present scalable algorithms for minimax convex-linear learning with adversarial distributional aggregation: minwWmaxpKL(w),p\min_{w \in \mathcal{W}} \max_{p \in \mathcal{K}} \langle L(w), p \rangle with KSn\mathcal{K} \subseteq \mathcal{S}_n representing the adversary's set (e.g., capped-sparsity simplex for max-loss, top-kk loss, or DRO). Online full-information OCO and combinatorial bandit approaches are combined to obtain O(1/T)O(1/\sqrt{T}) primal-dual gaps with per-round computation scaling only with the support size kk of the extreme points of K\mathcal{K}.

Minimax Formulations in Stochastic/Adversarial RL

In zero-sum Markov games and minimax Q-learning, the minimax value function is characterized via saddle-point Bellman equations, and convergence-accelerated stochastic approximation schemes are provably faster by leveraging contraction properties of relaxed operators (Diddigi et al., 2019, Wainwright, 2019). Variance-reduced Q-learning matches the minimax sample complexity lower bound O(D/ϵ2(1γ)3)O(D/\epsilon^2 (1-\gamma)^3) up to logarithmic factors (Wainwright, 2019).

Robust, Distributional, and Regret-based Learning

Minimax regret optimization (MRO) replaces worst-case risk minimization (DRO) with

minhHsupwW{Rw(h)Rw(hw)}\min_{h \in \mathcal{H}} \sup_{w \in \mathcal{W}} \{ R_w(h) - R_w(h_w) \}

yielding uniformly small excess loss across all plausible test distributions in W\mathcal{W}, rather than overfitting the most adversarial instance (Agarwal et al., 2022).

The minimax supervised learning framework (Farnia et al., 2016) derives the Bayes-optimal predictor via a maximum-conditional-entropy principle and solves for regularized convex GLM estimators with strong generalization guarantees.

3. Representative Applications and Domains

Domain Minimax Objective/Formulation Source
Online multi-objective learning Coordinate AMF-regret, softmax-weighted convex-concave surrogate games (Lee et al., 2021)
Multi-task learning Max-of-task-risk (or p\ell_p-risk) objective over empirical task vector (Mehta et al., 2012)
Structured prediction Minimax excess risk over factor-graph hypothesis class (Bello et al., 2019)
Bilevel/Meta-learning minxmaxy{F(x,y)γ2yf(x,y)2}\min_x \max_y \{F(x, y) - \frac{\gamma}{2}\|\nabla_y f(x, y)\|^2\} (Wang et al., 2023)
Online bandit/robust loss minwmaxpKL(w),p\min_{w} \max_{p\in\mathcal{K}} \langle L(w), p\rangle for K\mathcal{K} simplex (Roux et al., 2021)
Reinforcement/game learning Markovian minimax value, zero-sum RL Bellman equations (Diddigi et al., 2019)
Distribution shift/robustness Minimax regret across test reweightings (Agarwal et al., 2022)
Supervised learning (robust) Minimax expected loss over ambiguity set Γ\Gamma (Farnia et al., 2016)

Explicit minimax learning has been used for:

  • Adaptive online calibration and group fairness (multicalibration, multicalibeating as a minimax groupwise calibration game).
  • Structured prediction: tight lower and upper bounds for minimax risk and sample complexity which depend on combinatorial (factor-graph) dimensions, rather than on Y|\mathcal{Y}| (Bello et al., 2019).
  • Distribution shift and domain adaptation: robust generalization and adaptation using minimax regret metrics (Agarwal et al., 2022).
  • Reinforcement learning with adversarial perturbation or safety constraint: minimax DSAC and adversarial meta-RL via Stackelberg game/saddle-point methods (Ren et al., 2020, Li et al., 2022).

4. Theoretical Structure: Duality, Surrogates, and Generalizations

  • The classical minimax theorem requires convexity-concavity in the loss; in many learning scenarios (e.g., vector-valued non-concave max-of-losses, 0-1 loss games), this fails. Analytical surrogates—chiefly via softmax-weighted or convex-compositional aggregation—restore convexity-concavity, enabling the application of Sion's or von Neumann's theorem to guarantee existence of saddle points for surrogate games (Lee et al., 2021).
  • Duality theory underpins both the strong maximum-conditional-entropy result (Farnia et al., 2016) and the construction of minimax estimators from Bayes rules against least-favorable priors (Gupta et al., 2020).
  • Finite-support and combinatorial characterizations extend minimax duality to infinite games, especially under 0/1 losses, when the game's matrix lacks infinite triangular submatrices (Hanneke et al., 2021).
  • For multi-task and group-robust settings, intermediate objectives (e.g., loss-compositional via p\ell_p norms for 1p<1\leq p < \infty or α\alpha-minimax relaxations) allow trading off absolute worst-case risk against fairness/robustness and average performance (Mehta et al., 2012, Lee et al., 2021).

5. Statistical Minimax Rates and Lower Bounds

  • In both "classical" and structured settings, minimax sample complexity for excess risk scales as n(ϵ)=Θ(d/ϵ2)n(\epsilon) = \Theta(d/\epsilon^2), where dd is the relevant VC or factor-graph dimension (Bello et al., 2019).
  • For learning kernels in operator theory, minimax rates for mean-square error depend on spectral (Sobolev) smoothness and operator decay:
    • Polynomial decay: RMM2βr/(2βr+2r+1)R_M \sim M^{-2\beta r/(2\beta r+2r+1)}
    • Exponential decay: RMMβ/(β+1)R_M \sim M^{-\beta/(\beta+1)}
    • with matching upper/lower bounds (Zhang et al., 27 Feb 2025).
  • In active learning, the minimax label complexity is strictly superior to passive rates and tightly characterized by the star number and VC dimension, with detailed dependence on noise conditions (e.g., Tsybakov/Bernstein) (Hanneke et al., 2014).
  • For function estimation and robust regression, explicit minimax estimators can be constructed via online game playing with best-response (Bayes) oracles and follow-the-perturbed-leader adversaries, achieving risk at most RR^* plus an O(1/T)O(1/\sqrt{T}) regret (Gupta et al., 2020).

6. Extensions, Specializations, and Open Directions

  • Minimax and Regret: By recasting worst-case performance in terms of regret to an oracle (rather than risk alone), minimax formulations can avoid degeneracy in the presence of non-uniform or irreducible noise, yielding more informative guarantees and uniformly small excess risk (Agarwal et al., 2022).
  • Robust Shared Representation Learning: StablePCA fits a minimax convex relaxation to multi-source PCA using the Fantope, producing group-robust projections optimized for maximal error across sources. Convergence and practical certifiability are established via duality gaps and spectral criteria (Wang et al., 2 May 2025).
  • Computational Considerations: Efficient minimax learning depends critically on the structure of the adversarial set (e.g., extreme point sparsity in capped simplex constraints), enabling scalable online solvers for high-dimensional or combinatorial settings (Roux et al., 2021).

Ongoing research addresses minimax learning under non-convexities (bilevel, structured games, adversarial sampling with bi-level or Stackelberg structure), generalization under function approximation, and extensions to complex data modalities and distributional assumptions.


References:

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Minimax Learning Formulation.