Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits (1802.03386v1)

Published 9 Feb 2018 in cs.LG

Abstract: Regret bounds in online learning compare the player's performance to $L*$, the optimal performance in hindsight with a fixed strategy. Typically such bounds scale with the square root of the time horizon $T$. The more refined concept of first-order regret bound replaces this with a scaling $\sqrt{L*}$, which may be much smaller than $\sqrt{T}$. It is well known that minor variants of standard algorithms satisfy first-order regret bounds in the full information and multi-armed bandit settings. In a COLT 2017 open problem, Agarwal, Krishnamurthy, Langford, Luo, and Schapire raised the issue that existing techniques do not seem sufficient to obtain first-order regret bounds for the contextual bandit problem. In the present paper, we resolve this open problem by presenting a new strategy based on augmenting the policy space.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Zeyuan Allen-Zhu (53 papers)
  2. Sébastien Bubeck (90 papers)
  3. Yuanzhi Li (119 papers)
Citations (29)

Summary

We haven't generated a summary for this paper yet.