Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Budgeted Multi-Armed Bandits with Asymmetric Confidence Intervals (2306.07071v2)

Published 12 Jun 2023 in cs.LG and stat.ML

Abstract: We study the stochastic Budgeted Multi-Armed Bandit (MAB) problem, where a player chooses from $K$ arms with unknown expected rewards and costs. The goal is to maximize the total reward under a budget constraint. A player thus seeks to choose the arm with the highest reward-cost ratio as often as possible. Current state-of-the-art policies for this problem have several issues, which we illustrate. To overcome them, we propose a new upper confidence bound (UCB) sampling policy, $\omega$-UCB, that uses asymmetric confidence intervals. These intervals scale with the distance between the sample mean and the bounds of a random variable, yielding a more accurate and tight estimation of the reward-cost ratio compared to our competitors. We show that our approach has logarithmic regret and consistently outperforms existing policies in synthetic and real settings.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Marco Heyden (5 papers)
  2. Vadim Arzamasov (10 papers)
  3. Edouard Fouché (6 papers)
  4. Klemens Böhm (21 papers)

Summary

We haven't generated a summary for this paper yet.