Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Delayed Feedback in Generalised Linear Bandits Revisited (2207.10786v4)

Published 21 Jul 2022 in cs.LG and stat.ML

Abstract: The stochastic generalised linear bandit is a well-understood model for sequential decision-making problems, with many algorithms achieving near-optimal regret guarantees under immediate feedback. However, the stringent requirement for immediate rewards is unmet in many real-world applications where the reward is almost always delayed. We study the phenomenon of delayed rewards in generalised linear bandits in a theoretical manner. We show that a natural adaptation of an optimistic algorithm to the delayed feedback achieves a regret bound where the penalty for the delays is independent of the horizon. This result significantly improves upon existing work, where the best known regret bound has the delay penalty increasing with the horizon. We verify our theoretical results through experiments on simulated data.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Benjamin Howson (4 papers)
  2. Ciara Pike-Burke (17 papers)
  3. Sarah Filippi (24 papers)
Citations (13)

Summary

We haven't generated a summary for this paper yet.