Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Nonstochastic Bandits with Infinitely Many Experts (2102.05164v2)

Published 9 Feb 2021 in cs.LG and stat.ML

Abstract: We study the problem of nonstochastic bandits with expert advice, extending the setting from finitely many experts to any countably infinite set: A learner aims to maximize the total reward by taking actions sequentially based on bandit feedback while benchmarking against a set of experts. We propose a variant of Exp4.P that, for finitely many experts, enables inference of correct expert rankings while preserving the order of the regret upper bound. We then incorporate the variant into a meta-algorithm that works on infinitely many experts. We prove a high-probability upper bound of $\tilde{\mathcal{O}} \big( i*K + \sqrt{KT} \big)$ on the regret, up to polylog factors, where $i*$ is the unknown position of the best expert, $K$ is the number of actions, and $T$ is the time horizon. We also provide an example of structured experts and discuss how to expedite learning in such case. Our meta-learning algorithm achieves optimal regret up to polylog factors when $i* = \tilde{\mathcal{O}} \big( \sqrt{T/K} \big)$. If a prior distribution is assumed to exist for $i*$, the probability of optimality increases with $T$, the rate of which can be fast.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. X. Flora Meng (2 papers)
  2. Tuhin Sarkar (10 papers)
  3. Munther A. Dahleh (44 papers)
Citations (1)