Minimax optimal dynamic regret under time-varying arm sets
Determine whether minimax optimal dynamic regret can be achieved for non-stationary linear bandits when the feasible arm set varies over time, rather than being fixed, and establish algorithms or lower bounds that resolve this question.
References
Additionally, we note that our approach can handle time-varying arm set settings, whereas MASTER relies on the fixed arm set assumption. It remains unclear whether optimal regret can be achieved under time-varying arm set.
— Revisiting Weighted Strategy for Non-stationary Parametric Bandits and MDPs
(2601.01069 - Wang et al., 3 Jan 2026) in Section 3.2 (Algorithm and Regret Guarantee), concluding paragraph