Papers
Topics
Authors
Recent
Search
2000 character limit reached

Nonstochastic Bandits with Infinitely Many Experts

Published 9 Feb 2021 in cs.LG and stat.ML | (2102.05164v2)

Abstract: We study the problem of nonstochastic bandits with expert advice, extending the setting from finitely many experts to any countably infinite set: A learner aims to maximize the total reward by taking actions sequentially based on bandit feedback while benchmarking against a set of experts. We propose a variant of Exp4.P that, for finitely many experts, enables inference of correct expert rankings while preserving the order of the regret upper bound. We then incorporate the variant into a meta-algorithm that works on infinitely many experts. We prove a high-probability upper bound of $\tilde{\mathcal{O}} \big( i*K + \sqrt{KT} \big)$ on the regret, up to polylog factors, where $i*$ is the unknown position of the best expert, $K$ is the number of actions, and $T$ is the time horizon. We also provide an example of structured experts and discuss how to expedite learning in such case. Our meta-learning algorithm achieves optimal regret up to polylog factors when $i* = \tilde{\mathcal{O}} \big( \sqrt{T/K} \big)$. If a prior distribution is assumed to exist for $i*$, the probability of optimality increases with $T$, the rate of which can be fast.

Citations (1)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.