Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

First- and Second-Order Bounds for Adversarial Linear Contextual Bandits (2305.00832v3)

Published 1 May 2023 in cs.LG and stat.ML

Abstract: We consider the adversarial linear contextual bandit setting, which allows for the loss functions associated with each of $K$ arms to change over time without restriction. Assuming the $d$-dimensional contexts are drawn from a fixed known distribution, the worst-case expected regret over the course of $T$ rounds is known to scale as $\tilde O(\sqrt{Kd T})$. Under the additional assumption that the density of the contexts is log-concave, we obtain a second-order bound of order $\tilde O(K\sqrt{d V_T})$ in terms of the cumulative second moment of the learner's losses $V_T$, and a closely related first-order bound of order $\tilde O(K\sqrt{d L_T*})$ in terms of the cumulative loss of the best policy $L_T*$. Since $V_T$ or $L_T*$ may be significantly smaller than $T$, these improve over the worst-case regret whenever the environment is relatively benign. Our results are obtained using a truncated version of the continuous exponential weights algorithm over the probability simplex, which we analyse by exploiting a novel connection to the linear bandit setting without contexts.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Julia Olkhovskaya (11 papers)
  2. Jack Mayo (3 papers)
  3. Tim van Erven (32 papers)
  4. Gergely Neu (52 papers)
  5. Chen-Yu Wei (46 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.