Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Maximal Objectives in the Multi-armed Bandit with Applications (2006.06853v6)

Published 11 Jun 2020 in cs.LG and stat.ML

Abstract: In several applications of the stochastic multi-armed bandit problem, the traditional objective of maximizing the expected total reward can be inappropriate. In this paper, motivated by certain operational concerns in online platforms, we consider a new objective in the classical setup. Given $K$ arms, instead of maximizing the expected total reward from $T$ pulls (the traditional "sum" objective), we consider the vector of total rewards earned from each of the $K$ arms at the end of $T$ pulls and aim to maximize the expected highest total reward across arms (the "max" objective). For this objective, we show that any policy must incur an instance-dependent asymptotic regret of $\Omega(\log T)$ (with a higher instance-dependent constant compared to the traditional objective) and a worst-case regret of $\Omega(K{1/3}T{2/3})$. We then design an adaptive explore-then-commit policy featuring exploration based on appropriately tuned confidence bounds on the mean reward and an adaptive stopping criterion, which adapts to the problem difficulty and achieves these bounds (up to logarithmic factors). We then generalize our algorithmic insights to the problem of maximizing the expected value of the average total reward of the top $m$ arms with the highest total rewards. Our numerical experiments demonstrate the efficacy of our policies compared to several natural alternatives in practical parameter regimes. We discuss applications of these new objectives to the problem of grooming an adequate supply of value-providing market participants (workers/sellers/service providers) in online platforms.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Eren Ozbay (2 papers)
  2. Vijay Kamble (13 papers)

Summary

We haven't generated a summary for this paper yet.