Papers
Topics
Authors
Recent
Search
2000 character limit reached

Minimax Weighted Expected Regret (MWER)

Updated 19 May 2026
  • MWER is a decision-theoretic criterion that uses a weighted set of probabilities to minimize expected regret, blending Bayesian and minimax approaches.
  • It employs rigorous axiomatic foundations and likelihood-based updating to maintain consistency in both static and dynamic decision settings.
  • MWER finds practical application in reinforcement and online learning, providing tractable algorithms with strong theoretical performance guarantees.

Minimax Weighted Expected Regret (MWER) is a decision-theoretic criterion that generalizes classical minimax expected regret to settings where uncertainty is represented not by a single probability measure, nor by an unweighted set, but by a weighted set of probability measures. This framework provides a rigorous approach for robust decision-making under ambiguity, interpolating smoothly between Bayesian expected utility and traditional minimax regret, and supports a fully axiomatic characterization in both static and dynamic (updating) settings (Halpern et al., 2013, Halpern et al., 2012). MWER has been developed across decision theory, reinforcement learning, and online learning, delivering tight theoretical bounds and tractable algorithms for robust yet adaptive choice.

1. Foundations: Weighted Sets of Probabilities and Regret-Based Decision Rules

MWER is defined in terms of a weighted set of probabilities on a finite state space SS. Let P+={(Pr,αPr):Pr is a probability on S,0αPr1}P^+ = \{ (Pr, \alpha_{Pr}) : Pr \text{ is a probability on } S,\, 0 \le \alpha_{Pr} \le 1 \}, where each αPr\alpha_{Pr} quantifies the significance or credibility of PrPr. The normalization constraint supPrαPr=1\sup_{Pr} \alpha_{Pr} = 1 ensures comparability across measures.

Given a set XX of possible prizes and a utility function u:XRu:X\to\mathbb{R}, a Savage act is f:SXf:S\to X. For every feasible act ff and menu MM of available acts, the ex post optimal utility in state P+={(Pr,αPr):Pr is a probability on S,0αPr1}P^+ = \{ (Pr, \alpha_{Pr}) : Pr \text{ is a probability on } S,\, 0 \le \alpha_{Pr} \le 1 \}0 is P+={(Pr,αPr):Pr is a probability on S,0αPr1}P^+ = \{ (Pr, \alpha_{Pr}) : Pr \text{ is a probability on } S,\, 0 \le \alpha_{Pr} \le 1 \}1. The regret of act P+={(Pr,αPr):Pr is a probability on S,0αPr1}P^+ = \{ (Pr, \alpha_{Pr}) : Pr \text{ is a probability on } S,\, 0 \le \alpha_{Pr} \le 1 \}2 in P+={(Pr,αPr):Pr is a probability on S,0αPr1}P^+ = \{ (Pr, \alpha_{Pr}) : Pr \text{ is a probability on } S,\, 0 \le \alpha_{Pr} \le 1 \}3 is P+={(Pr,αPr):Pr is a probability on S,0αPr1}P^+ = \{ (Pr, \alpha_{Pr}) : Pr \text{ is a probability on } S,\, 0 \le \alpha_{Pr} \le 1 \}4, with expected regret under P+={(Pr,αPr):Pr is a probability on S,0αPr1}P^+ = \{ (Pr, \alpha_{Pr}) : Pr \text{ is a probability on } S,\, 0 \le \alpha_{Pr} \le 1 \}5 given by P+={(Pr,αPr):Pr is a probability on S,0αPr1}P^+ = \{ (Pr, \alpha_{Pr}) : Pr \text{ is a probability on } S,\, 0 \le \alpha_{Pr} \le 1 \}6. The weighted expected regret (WER) for P+={(Pr,αPr):Pr is a probability on S,0αPr1}P^+ = \{ (Pr, \alpha_{Pr}) : Pr \text{ is a probability on } S,\, 0 \le \alpha_{Pr} \le 1 \}7 is

P+={(Pr,αPr):Pr is a probability on S,0αPr1}P^+ = \{ (Pr, \alpha_{Pr}) : Pr \text{ is a probability on } S,\, 0 \le \alpha_{Pr} \le 1 \}8

The MWER decision rule selects any act in P+={(Pr,αPr):Pr is a probability on S,0αPr1}P^+ = \{ (Pr, \alpha_{Pr}) : Pr \text{ is a probability on } S,\, 0 \le \alpha_{Pr} \le 1 \}9 minimizing this quantity.

When all weights are unity (i.e., αPr\alpha_{Pr}0 for all αPr\alpha_{Pr}1), MWER reduces to standard minimax regret. When αPr\alpha_{Pr}2 is a singleton, MWER becomes subjective expected utility maximization, making it a natural generalization capable of interpolating between fully robust and fully Bayesian behaviors (Halpern et al., 2013, Halpern et al., 2012).

2. Weight Assignment and Likelihood-Based Updating

Initial weights αPr\alpha_{Pr}3 can arise from subjective confidence, expert judgment, or second-order priors. The only requirement is proper normalization (αPr\alpha_{Pr}4).

Upon observing new information αPr\alpha_{Pr}5 (with αPr\alpha_{Pr}6), MWER employs a likelihood updating rule. Each original αPr\alpha_{Pr}7 with αPr\alpha_{Pr}8 is replaced by αPr\alpha_{Pr}9, where

PrPr0

and all measures yielding the same conditional are merged by taking the supremum of possible weights. This approach ensures that the updated set is again normalized.

This updating preserves consistency, in that sequential updates commute: PrPr1. Additionally, under repeated observations generated by some PrPr2, the weights concentrate: PrPr3 almost surely, so MWER converges to expected utility under the true PrPr4 (Halpern et al., 2013, Halpern et al., 2012).

3. Axiomatic Characterization: Static and Dynamic

MWER is fully characterized by an axiomatic system within the Anscombe–Aumann framework. For static (single-stage) choice, the following conditions must be satisfied (for every menu PrPr5):

  1. Transitivity: If PrPr6 and PrPr7, then PrPr8.
  2. Completeness: For any PrPr9, either supPrαPr=1\sup_{Pr} \alpha_{Pr} = 10 or supPrαPr=1\sup_{Pr} \alpha_{Pr} = 11.
  3. Non-triviality: There exist supPrαPr=1\sup_{Pr} \alpha_{Pr} = 12 with supPrαPr=1\sup_{Pr} \alpha_{Pr} = 13.
  4. Monotonicity: If supPrαPr=1\sup_{Pr} \alpha_{Pr} = 14 state-wise dominates supPrαPr=1\sup_{Pr} \alpha_{Pr} = 15, then supPrαPr=1\sup_{Pr} \alpha_{Pr} = 16.
  5. Mixture continuity: Preferences are continuous under convex mixtures.
  6. Ambiguity aversion: If supPrαPr=1\sup_{Pr} \alpha_{Pr} = 17, then supPrαPr=1\sup_{Pr} \alpha_{Pr} = 18.
  7. Independence: Preference over supPrαPr=1\sup_{Pr} \alpha_{Pr} = 19 is stable under independent mixing with any XX0.
  8. Menu-independence for constants: For constant acts, preferences do not depend on the menu.
  9. INA: Adding acts never strictly optimal in any state does not change relative preferences among the rest. 10. Boundedness: Every menu admits a dominating constant act.

There is a representation theorem: preferences satisfying these axioms correspond precisely to MWER, with unique (up to affine transformation) utility and maximal normalized XX1 (Halpern et al., 2013, Halpern et al., 2012).

In dynamic settings, with sequential observations, an additional axiom applies:

  • Menu-Dependent Dynamic Consistency (MDC): If, after learning XX2, XX3 is preferred to XX4, then before learning XX5, the conditional act "play XX6 on XX7, XX8 otherwise" is preferred to the analogous XX9-plan.

This extension ensures that, after likelihood updating, preferences continue to admit a MWER representation with the appropriately updated u:XRu:X\to\mathbb{R}0.

4. MWER in Robust Sequential Learning and Reinforcement Learning

MWER admits a natural formalization for sequential decision-making problems, notably in reinforcement learning (RL) and online learning (Bongole et al., 2024, Moroshko et al., 2013). Given an unknown Markov Decision Process (MDP) parameterized by u:XRu:X\to\mathbb{R}1, the regret of a policy u:XRu:X\to\mathbb{R}2 is

u:XRu:X\to\mathbb{R}3

where u:XRu:X\to\mathbb{R}4 is the expected cumulative reward and u:XRu:X\to\mathbb{R}5 the value of the optimal policy for u:XRu:X\to\mathbb{R}6. Defining a weighted prior u:XRu:X\to\mathbb{R}7, the weighted expected regret is u:XRu:X\to\mathbb{R}8. The minimax weighted expected regret is then

u:XRu:X\to\mathbb{R}9

A minimax duality theorem shows that, under standard regularity (convexity, compactness, continuity), MWER coincides with classical minimax regret:

f:SXf:S\to X0

(Bongole et al., 2024).

MWER enables the direct use of information-theoretic Bayesian regret bounds to obtain robust minimax rates, including for finite-horizon MDPs, linear and contextual bandits. For example, in multi-armed bandits with f:SXf:S\to X1 arms and f:SXf:S\to X2 rounds, MWER achieves f:SXf:S\to X3 regret. This framework reduces robust sequential learning to the optimization of weighted expected regret, facilitating tractable approximation and computation via duality and game-theoretic techniques (Bongole et al., 2024).

5. Algorithmic Realization: Weighted Minimax in Online Learning

In online linear regression with adversarial labels, MWER is instantiated by the Weighted Last-Step Min-Max (WEMM) algorithm (Moroshko et al., 2013). At each round f:SXf:S\to X4, the algorithm predicts using the weighted least-squares solution formed from the history, with weights f:SXf:S\to X5 selected so as to ensure feasibility of the min-max saddle point. The weighted cumulative loss for a comparator f:SXf:S\to X6 is

f:SXf:S\to X7

and the algorithm guarantees

f:SXf:S\to X8

for any feasible weight sequence, delivering zero weighted minimax regret.

By careful design, including recursive updates and data-driven choice of f:SXf:S\to X9, the difference between ff0 and the standard unweighted loss ff1 can be controlled, yielding logarithmic or sub-logarithmic regret in ff2 when the data or labels are favorable. The approach extends to weakly non-stationary environments, where regret is measured relative to slowly drifting comparators.

Compared to prior last-step min-max forecasters that required known bounds and uniform weights, WEMM achieves improved constants, relaxes the need for a-priori adversarial bounds, and is competitive in environments with mild non-stationarity (Moroshko et al., 2013).

6. Relation to Classical Decision Criteria and Properties

MWER rigorously interpolates between minimax expected regret (MER) and subjective expected utility (SEU):

  • When all ff3, MWER coincides with MER, fully robust to ambiguity.
  • When ff4 and ff5, MWER coincides with SEU, fully Bayesian.
  • As learning progresses and likelihood-based updating concentrates, MWER transitions smoothly from MER to SEU, capturing learning from data (Halpern et al., 2013, Halpern et al., 2012).

Distinctive features of MWER include:

  • Ambiguity sensitivity: MWER handles ambiguity aversion through its axiomatic basis.
  • Menu dependence: Preferences can depend on the set of available acts, reflecting the regret criterion's sensitivity to alternative actions, in contrast to maximin expected utility (MMEU), which is menu-independent.
  • Dynamic consistency: Through its updating rule and dynamic axioms, MWER ensures that plans made before and after new information are revealed are mutually consistent in behavior.
  • Overcoming set-model limitations: MWER remedies the inability of pure set-based probability models to learn relative likelihoods through data and avoids the collapse to SEU of second-order probability approaches.

7. Illustrative Example and Implications

A representative example is the delivery robot problem, with two possible states (“1 broken cake”, “10 broken cakes”) and three acts (“continue”, “back”, “check”). Initial symmetric weights ff6 lead MWER to behave identically to MER. With repeated favorable observations (e.g., "first ff7 cakes are unbroken"), likelihood updating increases the weight on the more plausible ff8, and MWER shifts toward SEU-optimal actions for ff9. This demonstrates MWER’s ability to interpolate between robust and data-driven decision-making, adapting preferences as weights evolve through learning (Halpern et al., 2013, Halpern et al., 2012).

In summary, MWER unifies and extends foundational approaches to robust choice under ambiguity, admits explicit and tractable updating, supports strong theoretical guarantees across decision theory, online learning, and reinforcement learning, and is characterized through natural and interpretable axioms. The framework enables both principled robust decision-making and smooth adaptation as information accrues.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Minimax Weighted Expected Regret (MWER).