Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 96 tok/s
Gemini 2.5 Pro 55 tok/s Pro
GPT-5 Medium 38 tok/s
GPT-5 High 38 tok/s Pro
GPT-4o 96 tok/s
GPT OSS 120B 466 tok/s Pro
Kimi K2 214 tok/s Pro
2000 character limit reached

Imprecise Multi-Armed Bandits (2405.05673v1)

Published 9 May 2024 in cs.LG and stat.ML

Abstract: We introduce a novel multi-armed bandit framework, where each arm is associated with a fixed unknown credal set over the space of outcomes (which can be richer than just the reward). The arm-to-credal-set correspondence comes from a known class of hypotheses. We then define a notion of regret corresponding to the lower prevision defined by these credal sets. Equivalently, the setting can be regarded as a two-player zero-sum game, where, on each round, the agent chooses an arm and the adversary chooses the distribution over outcomes from a set of options associated with this arm. The regret is defined with respect to the value of game. For certain natural hypothesis classes, loosely analgous to stochastic linear bandits (which are a special case of the resulting setting), we propose an algorithm and prove a corresponding upper bound on regret. We also prove lower bounds on regret for particular special cases.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Authors (1)

Youtube Logo Streamline Icon: https://streamlinehq.com