Gradient Ascent for Active Exploration in Bandit Problems

Published 20 May 2019 in stat.ML and cs.LG | (1905.08165v1)

Abstract: We present a new algorithm based on an gradient ascent for a general Active Exploration bandit problem in the fixed confidence setting. This problem encompasses several well studied problems such that the Best Arm Identification or Thresholding Bandits. It consists of a new sampling rule based on an online lazy mirror ascent. We prove that this algorithm is asymptotically optimal and, most importantly, computationally efficient.