Lower Bounds for Multi-armed Bandit with Non-equivalent Multiple Plays (1507.04910v1)
Abstract: We study the stochastic multi-armed bandit problem with non-equivalent multiple plays where, at each step, an agent chooses not only a set of arms, but also their order, which influences reward distribution. In several problem formulations with different assumptions, we provide lower bounds for regret with standard asymptotics $O(\log{t})$ but novel coefficients and provide optimal algorithms, thus proving that these bounds cannot be improved.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.