2000 character limit reached
    
  Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation (2006.00475v3)
    Published 31 May 2020 in math.OC, cs.LG, and stat.ML
  
  Abstract: We prove that the information-theoretic upper bound on the minimax regret for zeroth-order adversarial bandit convex optimisation is at most $O(d{2.5} \sqrt{n} \log(n))$, where $d$ is the dimension and $n$ is the number of interactions. This improves on $O(d{9.5} \sqrt{n} \log(n){7.5}$ by Bubeck et al. (2017). The proof is based on identifying an improved exploratory distribution for convex functions.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.