Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Random Shuffling and Resets for the Non-stationary Stochastic Bandit Problem (1609.02139v1)

Published 7 Sep 2016 in cs.AI

Abstract: We consider a non-stationary formulation of the stochastic multi-armed bandit where the rewards are no longer assumed to be identically distributed. For the best-arm identification task, we introduce a version of Successive Elimination based on random shuffling of the $K$ arms. We prove that under a novel and mild assumption on the mean gap $\Delta$, this simple but powerful modification achieves the same guarantees in term of sample complexity and cumulative regret than its original version, but in a much wider class of problems, as it is not anymore constrained to stationary distributions. We also show that the original {\sc Successive Elimination} fails to have controlled regret in this more general scenario, thus showing the benefit of shuffling. We then remove our mild assumption and adapt the algorithm to the best-arm identification task with switching arms. We adapt the definition of the sample complexity for that case and prove that, against an optimal policy with $N-1$ switches of the optimal arm, this new algorithm achieves an expected sample complexity of $O(\Delta{-2}\sqrt{NK\delta{-1} \log(K \delta{-1})})$, where $\delta$ is the probability of failure of the algorithm, and an expected cumulative regret of $O(\Delta{-1}{\sqrt{NTK \log (TK)}})$ after $T$ time steps.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Robin Allesiardo (3 papers)
  2. Raphaël Féraud (11 papers)
  3. Odalric-Ambrym Maillard (48 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.