Minimax optimal dynamic regret under time-varying arm sets

Determine whether minimax optimal dynamic regret can be achieved for non-stationary linear bandits when the feasible arm set varies over time, rather than being fixed, and establish algorithms or lower bounds that resolve this question.

Background

Most existing minimax results for non-stationary bandits assume a fixed arm set. The MASTER algorithm attains the optimal rate under this assumption, while the proposed weighted approach can handle time-varying arm sets but without proven optimality. The authors explicitly highlight uncertainty about achieving optimal regret in the time-varying arm set setting.

This open question calls for characterizing the optimal dynamic regret rate (and possibly designing matching algorithms) when the arm set changes over time, which is common in practical applications.

References

Additionally, we note that our approach can handle time-varying arm set settings, whereas MASTER relies on the fixed arm set assumption. It remains unclear whether optimal regret can be achieved under time-varying arm set.

— Revisiting Weighted Strategy for Non-stationary Parametric Bandits and MDPs (2601.01069 - Wang et al., 3 Jan 2026) in Section 3.2 (Algorithm and Regret Guarantee), concluding paragraph

Minimax optimal dynamic regret under time-varying arm sets

Background

References

Related Problems