Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Prediction with Expert Advice

Updated 24 June 2025

Prediction with expert advice is a classical paradigm in online learning in which a learner combines sequential advice from a set of experts to make predictions, adapting dynamically to minimize its loss relative to the best-performing expert or combination of experts in hindsight. The objective is to develop algorithms with tight regret guarantees, efficient computation, and robustness to various forms of expert behavior, including adaptive and "second-guessing" experts whose advice depends on the learner's own actions. This framework underpins many developments in sequential prediction, online convex optimization, and adaptive ensemble methods.

1. Theoretical Foundations and Problem Setting

In the prototypical prediction with expert advice scenario, the learner operates over NN rounds. At round nn:

  • Each expert kk provides a predictive function γk:[0,1][0,1]\gamma^k: [0,1] \to [0,1] specifying advice, which may be a static recommendation or, in the general framework, a function of the learner’s prediction pnp_n.
  • The learner selects its own forecast pn[0,1]p_n \in [0,1].
  • The actual outcome wn{0,1}w_n \in \{0,1\} is revealed.
  • Losses are incurred according to a specified loss function λ(w,p)\lambda(w, p), such as the quadratic loss (pw)2(p - w)^2 or the log loss ln(p)-\ln(p) for w=1w=1, ln(1p)-\ln(1-p) for w=0w=0.

The learner's performance is measured by its cumulative loss LN=n=1Nλ(wn,pn)L_N = \sum_{n=1}^N \lambda(w_n, p_n) relative to each expert's realized loss LNk=n=1Nλ(wn,γk(pn))L^k_N = \sum_{n=1}^N \lambda(w_n, \gamma^k(p_n)). The standard regret guarantee sought is: LNmink=1,,KLNk+aK,L_N \leq \min_{k=1,\dotsc,K} L^k_N + a_K, where aKa_K is a constant depending on the number of experts and the loss function.

2. Defensive Forecasting: Core Principles and Algorithm

Defensive forecasting is a methodology that constructs the learner’s predictions so as to ensure the non-increase of a carefully chosen forecast-continuous supermartingale over time. The predictive process is defined as follows:

  1. At each round, all experts announce continuous prediction functions γk\gamma^k.
  2. The learner selects pn[0,1]p_n \in [0,1].
  3. The actual binary outcome wnw_n is revealed.
  4. Losses are incurred as described above.

A central mathematical device is the supermartingale: SN:=k=1Kwkexp(κn=1N[λ(wn,pn)λ(wn,γk(pn))]),S_N := \sum_{k=1}^K w_k \exp \left( \kappa \sum_{n=1}^N \left[ \lambda(w_n, p_n) - \lambda(w_n, \gamma^k(p_n)) \right] \right), where wkw_k are normalized weights, and κ\kappa is a parameter reflecting the mixability of the loss.

The key mechanism, formalized by a lemma of Levin and Takemura, is that for any forecast-continuous supermartingale, the learner can choose pnp_n at each round such that SS does not increase, regardless of the experts’ strategies. This property ensures robust hedging against all possible outcome realizations. The existence of such a pnp_n follows from Ky Fan’s minimax theorem, applicable due to the continuity in the learner’s prediction and experts' advice.

3. Regret Guarantees for Mixable Loss Functions

For perfectly mixable loss functions (such as quadratic and log losses), defensive forecasting achieves tight regret bounds:

  • Quadratic loss (pw)2(p-w)^2, with κ=2\kappa=2:

LNLNk+lnK2L_N \leq L^k_N + \frac{\ln K}{2}

  • Log loss, with κ=1\kappa=1:

LNLNk+lnKL_N \leq L^k_N + \ln K

  • General perfectly mixable loss:

LNLNk+lnKηL_N \leq L^k_N + \frac{\ln K}{\eta}

where η\eta is the maximal mixability constant for the loss.

These bounds exactly match those attainable by the Aggregating Algorithm (AA) in the fixed-expert case, as formalized in Watkins’s theorem.

4. Handling Second-Guessing Experts

A significant extension provided by defensive forecasting is the capacity to handle “second-guessing” experts, whose prediction functions γk(p)\gamma^k(p) depend continuously on the learner’s current pp. The only requirement is the continuity of γk\gamma^k in pp; no further regularity is needed.

Classic approaches, such as the AA, cannot generally accommodate this scenario because they rely on fixed expert predictions and thus cannot account for the feedback loop induced by second-guessing. Defensive forecasting, by ensuring the supermartingale property for all possible dependencies, remains robust and optimal in this more general setting.

5. Comparison to Aggregating Algorithm and Practical Implications

The Aggregating Algorithm is optimal among algorithms using exponential weighting of fixed expert advice under perfectly mixable losses, providing the same regret guarantees as defensive forecasting in the fixed-expert regime. However, defensive forecasting surpasses the AA in generality:

  • It supports experts whose advice changes adaptively (second-guessing),
  • It does not require randomization or any specific exponential weighting of predictions,
  • It is constructive: at each round, the learner’s action is computed as the minimax solution to a continuous function, typically implementable via root finding over [0,1][0,1] due to the continuity properties.

A summary table: | Setting | Regret Bound | Handles second-guessing experts? | |--------------------------|------------------------------|---------------------| | AA (fixed experts) | lnKη\frac{\ln K}{\eta} | No | | Defensive Forecasting| lnKη\frac{\ln K}{\eta} | Yes |

6. Algorithmic and Practical Considerations

To implement defensive forecasting:

  • At each round and for each possible outcome, the learner computes the effect of a prospective pnp_n on the supermartingale and selects a prediction to ensure SS does not increase.
  • For mixable losses with simple form (like square or log loss), the required computation reduces to solving a one-dimensional minimization or root-finding problem, which is efficient in practice.
  • Performance (in terms of regret) is not affected by the adaptivity or the continuity of the expert’s advice, provided mixability holds.

Potential limitations include the need for numerically solving the minimax problem at each round, though for many losses and settings this is tractable and can be parallelized or approximated as needed.

7. Significance and Influence

Defensive forecasting both matches the optimal regret rates of classical approaches in standard settings and greatly expands the scope of sequential prediction methods to more complex expert behaviors, including continuous and adaptive response to the learner’s strategy. This advances theoretical understanding and enables new applications in scenarios where experts are learning, strategic, or adaptive, laying the groundwork for robust online ensemble prediction in adversarial and feedback-rich environments.