Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits (2403.03219v2)

Published 5 Mar 2024 in cs.LG and stat.ML

Abstract: This study considers the linear contextual bandit problem with independent and identically distributed (i.i.d.) contexts. In this problem, existing studies have proposed Best-of-Both-Worlds (BoBW) algorithms whose regrets satisfy $O(\log2(T))$ for the number of rounds $T$ in a stochastic regime with a suboptimality gap lower-bounded by a positive constant, while satisfying $O(\sqrt{T})$ in an adversarial regime. However, the dependency on $T$ has room for improvement, and the suboptimality-gap assumption can be relaxed. For this issue, this study proposes an algorithm whose regret satisfies $O(\log(T))$ in the setting when the suboptimality gap is lower-bounded. Furthermore, we introduce a margin condition, a milder assumption on the suboptimality gap. That condition characterizes the problem difficulty linked to the suboptimality gap using a parameter $\beta \in (0, \infty]$. We then show that the algorithm's regret satisfies $O\left(\left{\log(T)\right}{\frac{1+\beta}{2+\beta}}T{\frac{1}{2+\beta}}\right)$. Here, $\beta= \infty$ corresponds to the case in the existing studies where a lower bound exists in the suboptimality gap, and our regret satisfies $O(\log(T))$ in that case. Our proposed algorithm is based on the Follow-The-Regularized-Leader with the Tsallis entropy and referred to as the $\alpha$-Linear-Contextual (LC)-Tsallis-INF.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Improved algorithms for linear stochastic bandits. In Advances in Neural Information Processing Systems (NeurIPS), 2011.
  2. Associative reinforcement learning using linear probabilistic concepts. In International Conference on Machine Learning (ICML), 1999.
  3. An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits. In Conference on Learning Theory (COLT), 2016.
  4. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2):235–256, 2002.
  5. Online decision making with high-dimensional covariates. Operations Research, 68(1), 2020.
  6. Contextual bandit algorithms with supervised learning guarantees. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2011.
  7. The best of both worlds: Stochastic and adversarial bandits. In Conference on Learning Theory (COLT), 2012.
  8. Efficient and robust high-dimensional linear contextual bandits. In International Joint Conference on Artificial Intelligence (IJCAI), 2020.
  9. Contextual bandits with linear payoff functions. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2011.
  10. Stochastic linear optimization under bandit feedback. In Annual Conference Computational Learning Theory (COLT), 2008.
  11. A blackbox approach to best of both worlds in bandits and beyond. In Neu, G. and Rosasco, L. (eds.), Conference on Learning Theory (COLT), volume 195, pp.  5503–5570. PMLR, 2023a.
  12. Best of both worlds policy optimization. In International Conference on Machine Learning (ICML), 2023b.
  13. Robust stochastic linear contextual bandits under adversarial attacks. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2022.
  14. A linear response bandit problem. Stochastic Systems, 2013.
  15. Better algorithms for stochastic bandits with adversarial corruptions. In Conference on Learning Theory (COLT), 2019.
  16. Online learning with low rank experts. In Conference on Learning Theory (COLT), 2016.
  17. Nearly optimal algorithms for linear contextual bandits with adversarial corruptions. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
  18. Nearly optimal best-of-both-worlds algorithms for online learning with feedback graphs. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
  19. Improved best-of-both-worlds guarantees for multi-armed bandits: Ftrl with general regularizers and multiple optimal arms. In Advances in Neural Information Processing Systems (NeurIPS), 2023.
  20. Learning hurdles for sleeping experts. ACM Transactions on Computation Theory, 6(3), 2014.
  21. Best-of-both-worlds linear contextual bandits, 2023. arXIv:2312.16489.
  22. Best-of-three-worlds analysis for linear bandits with follow-the-regularized-leader algorithm. In Conference on Learning Theory (COLT), 2023.
  23. Best-of-both-worlds algorithms for linear contextual bandits, 2023.
  24. Bandit Algorithms. Cambridge University Press, 2020.
  25. Achieving near instance-optimality and minimax-optimality in stochastic and adversarial linear bandits simultaneously. In International Conference on Machine Learning (ICML), 2021.
  26. Regret lower bound and optimal algorithm for high-dimensional contextual linear bandit. Electronic Journal of Statistics, 15(2):5652 – 5695, 2021.
  27. A contextual-bandit approach to personalized news article recommendation. In International Conference on World Wide Web (WWW), 2010.
  28. Bypassing the simulator: Near-optimal adversarial linear contextual bandits. In Advances in Neural Information Processing Systems (NeurIPS), 2023.
  29. Policy optimization in adversarial mdps: Improved exploration via dilated bonuses. In Advances in Neural Information Processing Systems (NeurIPS), volume 34, pp.  22931–22942, 2021.
  30. Competitive caching with machine learned advice. In International Conference on Machine Learning (ICML), 2018.
  31. Improved analysis of the tsallis-inf algorithm in stochastically constrained adversarial bandits and stochastic bandits with adversarial corruptions. In Conference on Learning Theory (COLT), 2021.
  32. Efficient and robust algorithms for adversarial linear contextual bandits. In Conference on Learning Theory (COLT), 2020.
  33. Bistro: An efficient relaxation-based method for contextual bandits. In International Conference on Machine Learning (ICML), 2016.
  34. Tsallis-inf for decoupled exploration and exploitation in multi-armed bandits. In Conference on Learning Theory (COLT), volume 125, pp. 3227–3249. PMLR, 2020.
  35. An improved parametrization and analysis of the EXP3++ algorithm for stochastic and adversarial bandits. In Conference on Learning Theory (COLT), 2017.
  36. One practical algorithm for both stochastic and adversarial bandits. In International Conference on Machine Learning (ICML), 2014.
  37. Improved regret bounds for oracle-based adversarial contextual bandits. In Advances in Neural Information Processing Systems (NeurIPS), 2016.
  38. From ads to interventions: Contextual bandits in mobile health. In Mobile Health: Sensors, Analytic Methods, and Applications, pp.  495–517, 2017.
  39. Best-of-both-worlds algorithms for partial monitoring. In International Conference on Algorithmic Learning Theory (ALT), 2023a.
  40. Stability-penalty-adaptive follow-the-regularized-leader: Sparsity, game-dependency, and best-of-both-worlds. In Advances in Neural Information Processing Systems (NeurIPS), 2023b.
  41. Minimax concave penalized multi-armed bandit model with high-dimensional covariates. In International Conference on Machine Learning (ICML), 2018.
  42. More adaptive algorithms for adversarial bandits. In Conference on Learning Theory (COLT), 2018.
  43. Linear contextual bandits with adversarial corruptions, 2021. URL https://openreview.net/forum?id=Wz-t1oOTWa.
  44. Tsallis-inf: An optimal algorithm for stochastic and adversarial bandits. Journal of Machine Learning Research, 22(1), 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Masahiro Kato (49 papers)
  2. Shinji Ito (31 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com