Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generalizing distribution of partial rewards for multi-armed bandits with temporally-partitioned rewards (2211.06883v1)

Published 13 Nov 2022 in cs.LG and stat.ML

Abstract: We investigate the Multi-Armed Bandit problem with Temporally-Partitioned Rewards (TP-MAB) setting in this paper. In the TP-MAB setting, an agent will receive subsets of the reward over multiple rounds rather than the entire reward for the arm all at once. In this paper, we introduce a general formulation of how an arm's cumulative reward is distributed across several rounds, called Beta-spread property. Such a generalization is needed to be able to handle partitioned rewards in which the maximum reward per round is not distributed uniformly across rounds. We derive a lower bound on the TP-MAB problem under the assumption that Beta-spread holds. Moreover, we provide an algorithm TP-UCB-FR-G, which uses the Beta-spread property to improve the regret upper bound in some scenarios. By generalizing how the cumulative reward is distributed, this setting is applicable in a broader range of applications.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Ronald C. van den Broek (2 papers)
  2. Rik Litjens (2 papers)
  3. Tobias Sagis (3 papers)
  4. Luc Siecker (2 papers)
  5. Nina Verbeeke (2 papers)
  6. Pratik Gajane (19 papers)

Summary

We haven't generated a summary for this paper yet.