Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Best-of-Both-Worlds Algorithms for Linear Contextual Bandits (2312.15433v2)

Published 24 Dec 2023 in cs.LG and stat.ML

Abstract: We study best-of-both-worlds algorithms for $K$-armed linear contextual bandits. Our algorithms deliver near-optimal regret bounds in both the adversarial and stochastic regimes, without prior knowledge about the environment. In the stochastic regime, we achieve the polylogarithmic rate $\frac{(dK)2\mathrm{poly}\log(dKT)}{\Delta_{\min}}$, where $\Delta_{\min}$ is the minimum suboptimality gap over the $d$-dimensional context space. In the adversarial regime, we obtain either the first-order $\widetilde{O}(dK\sqrt{L*})$ bound, or the second-order $\widetilde{O}(dK\sqrt{\Lambda*})$ bound, where $L*$ is the cumulative loss of the best action and $\Lambda*$ is a notion of the cumulative second moment for the losses incurred by the algorithm. Moreover, we develop an algorithm based on FTRL with Shannon entropy regularizer that does not require the knowledge of the inverse of the covariance matrix, and achieves a polylogarithmic regret in the stochastic regime while obtaining $\widetilde{O}\big(dK\sqrt{T}\big)$ regret bounds in the adversarial regime.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Improved algorithms for linear stochastic bandits. In Proc. of Neural Information Processing Systems (NeurIPS), pages 2312–2320, 2011.
  2. N. Abe and P. M. Long. Associative reinforcement learning using linear probabilistic concepts. In Proc. of International Conference on Machine Learning (ICML), pages 3–11, 1999.
  3. Open problem: First-order regret bounds for contextual bandits. In Proc. of Annual Conference on Learning Theory (COLT), pages 4–7, 2017a.
  4. Corralling a band of bandit algorithms. In Proc. of Annual Conference on Learning Theory (COLT), pages 12–38, 2017b.
  5. S. Agrawal and N. Goyal. Thompson sampling for contextual bandits with linear payoffs. In Proc. of International Conference on Machine Learning (ICML), pages 127–135, 2013.
  6. Make the minority great again: First-order regret bound for contextual bandits. In Proc. of International Conference on Machine Learning (ICML), pages 186–194, 2018.
  7. P. Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research (JMLR), 3(Nov):397–422, 2002.
  8. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1):48–77, 2002.
  9. B. Awerbuch and R. D. Kleinberg. Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches. In Proc. of ACM Symp. on Theory of Computing (STOC), page 45–53, 2004.
  10. Mostly exploration-free algorithms for contextual bandits. Management Science, 67(3):1329–1349, 2021.
  11. Corruption-tolerant gaussian process bandit optimization. In Proc. of International Conference on Artificial Intelligence and Statistics (AISTATS), pages 1071–1081, 2020.
  12. Stochastic linear bandits robust to adversarial attacks. In Proc. of International Conference on Artificial Intelligence and Statistics (AISTATS), pages 991–999, 2021.
  13. S. Bubeck and A. Slivkins. The best of both worlds: Stochastic and adversarial bandits. In Proceedings of the 25th Annual Conference on Learning Theory, pages 42.1–42.23, 2012.
  14. N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006.
  15. Contextual bandits with linear payoff functions. In Proc. of International Conference on Artificial Intelligence and Statistics (AISTATS), pages 208–214, 2011.
  16. Dynamic balancing for model selection in bandits and rl. In Proc. of International Conference on Machine Learning (ICML), pages 2276–2285, 2021.
  17. A blackbox approach to best of both worlds in bandits and beyond. In Proc. of Annual Conference on Learning Theory (COLT), pages 5503–5570, 2023.
  18. Robust stochastic linear contextual bandits under adversarial attacks. In Proc. of International Conference on Artificial Intelligence and Statistics (AISTATS), pages 7111–7123, 2022.
  19. D. Foster and A. Rakhlin. Beyond UCB: Optimal and efficient contextual bandits with regression oracles. In Proc. of International Conference on Machine Learning (ICML), pages 3199–3210, 2020.
  20. Practical contextual bandits with regression oracles. In Proc. of International Conference on Machine Learning (ICML), pages 1539–1548, 2018.
  21. D. J. Foster and A. Krishnamurthy. Efficient first-order contextual bandits: Prediction, allocation, and triangular discrimination. In Proc. of Neural Information Processing Systems (NeurIPS), pages 18907–18919, 2021.
  22. Adapting to misspecification in contextual bandits. In Proc. of Neural Information Processing Systems (NeurIPS), pages 11478–11489, 2020.
  23. Adversarial attacks on linear contextual bandits. In Proc. of Neural Information Processing Systems (NeurIPS), pages 14362–14373, 2020.
  24. Better algorithms for stochastic bandits with adversarial corruptions. In Proc. of Annual Conference on Learning Theory (COLT), pages 1562–1578, 2019.
  25. Nearly optimal algorithms for linear contextual bandits with adversarial corruptions. In Proc. of Neural Information Processing Systems (NeurIPS), pages 34614–34625, 2022.
  26. S. Ito. Hybrid regret bounds for combinatorial semi-bandits and adversarial linear bandits. In Proc. of Neural Information Processing Systems (NeurIPS), pages 2654–2667, 2021.
  27. S. Ito and K. Takemura. An exploration-by-optimization approach to best of both worlds in linear bandits. In Thirty-seventh Conference on Neural Information Processing Systems, 2023a.
  28. S. Ito and K. Takemura. Best-of-three-worlds linear bandit algorithm with variance-adaptive regret bounds. In Proc. of Annual Conference on Learning Theory (COLT), pages 2653–2677, 2023b.
  29. Tight first- and second-order regret bounds for adversarial linear bandits. In Proc. of Neural Information Processing Systems (NeurIPS), pages 2028–2038, 2020.
  30. Nearly optimal best-of-both-worlds algorithms for online learning with feedback graphs. In Proc. of Neural Information Processing Systems (NeurIPS), pages 28631–28643, 2022.
  31. The best of both worlds: stochastic and adversarial episodic mdps with unknown transition. In Proc. of Neural Information Processing Systems (NeurIPS), pages 20491–20502, 2021.
  32. Adversarial attacks on stochastic bandits. In Proc. of Neural Information Processing Systems (NeurIPS), pages 3640–3649, 2018.
  33. Robust lipschitz bandits to adversarial corruptions. arXiv preprint arXiv:2305.18543, 2023.
  34. Best-of-three-worlds analysis for linear bandits with follow-the-regularized-leader algorithm. In Proc. of Annual Conference on Learning Theory (COLT), pages 657–673, 2023.
  35. T. Lattimore and C. Szepesvári. Bandit algorithms. Cambridge University Press, 2020.
  36. Achieving near instance-optimality and minimax-optimality in stochastic and adversarial linear bandits simultaneously. In Proc. of International Conference on Machine Learning (ICML), pages 6142–6151, 2021.
  37. Nearly minimax-optimal regret for linearly parameterized bandits. In Proc. of Annual Conference on Learning Theory (COLT), pages 2173–2174, 2019.
  38. F. Liu and N. Shroff. Data poisoning attacks on stochastic bandits. In Proc. of International Conference on Machine Learning (ICML), pages 4042–4050, 2019.
  39. Stochastic bandits robust to adversarial corruptions. In Proc. of ACM Symp. on Theory of Computing (STOC), pages 114–122, 2018.
  40. G. Neu and G. Bartók. An efficient algorithm for learning with semi-bandit feedback. In S. Jain, R. Munos, F. Stephan, and T. Zeugmann, editors, Proc. of International Conference on Algorithmic Learning Theory (ALT), pages 234–248, 2013.
  41. G. Neu and G. Bartók. Importance weighting without importance weights: An efficient algorithm for combinatorial semi-bandits. Journal of Machine Learning Research (JMLR), 17(1):5355–5375, 2016. ISSN 1532-4435.
  42. G. Neu and J. Olkhovskaya. Efficient and robust algorithms for adversarial linear contextual bandits. In Proc. of Annual Conference on Learning Theory (COLT), pages 3049–3068, 2020.
  43. First-and second-order bounds for adversarial linear contextual bandits. In Proc. of Neural Information Processing Systems (NeurIPS), 2023.
  44. Model selection in contextual stochastic bandit problems. In Proc. of Neural Information Processing Systems (NeurIPS), pages 10328–10337, 2020.
  45. Best of both worlds model selection. In Proc. of Neural Information Processing Systems (NeurIPS), pages 1883–1895, 2022.
  46. A. Rakhlin and K. Sridharan. Online learning with predictable sequences. In Proc. of Annual Conference on Learning Theory (COLT), pages 993–1019, 2013.
  47. A. Rakhlin and K. Sridharan. Bistro: An efficient relaxation-based method for contextual bandits. In Proc. of International Conference on Machine Learning (ICML), pages 1977–1985, 2016.
  48. A near-optimal best-of-both-worlds algorithm for online learning with feedback graphs. In Proc. of Neural Information Processing Systems (NeurIPS), pages 35035–35048, 2022.
  49. Y. Seldin and G. Lugosi. An improved parametrization and analysis of the EXP3++ algorithm for stochastic and adversarial bandits. In Proceedings of the 2017 Conference on Learning Theory, pages 1743–1759, 2017.
  50. Y. Seldin and A. Slivkins. One practical algorithm for both stochastic and adversarial bandits. In Proc. of International Conference on Machine Learning (ICML), pages 1287–1295, 2014.
  51. Efficient algorithms for adversarial contextual learning. In Proc. of International Conference on Machine Learning (ICML), pages 2159–2168, 2016.
  52. Best-of-both-worlds algorithms for partial monitoring. In Proc. of International Conference on Algorithmic Learning Theory (ALT), pages 1484–1515, 2023a.
  53. Stability-penalty-adaptive follow-the-regularized-leader: Sparsity, game-dependency, and best-of-both-worlds. In Proc. of Neural Information Processing Systems (NeurIPS), 2023b.
  54. C.-Y. Wei and H. Luo. More adaptive algorithms for adversarial bandits. In Proc. of Annual Conference on Learning Theory (COLT), pages 1263–1291, 2018.
  55. A model selection approach for corruption robust reinforcement learning. In Proc. of International Conference on Algorithmic Learning Theory (ALT), pages 1043–1096, 2022.
  56. Corruption-robust algorithms with uncertainty weighting for nonlinear contextual bandits and Markov decision processes. In Proc. of International Conference on Machine Learning (ICML), pages 39834–39863, 2023.
  57. Linear contextual bandits with adversarial corruptions, 2021.
  58. Nonstochastic contextual combinatorial bandits. In Proc. of International Conference on Artificial Intelligence and Statistics (AISTATS), pages 8771–8813, 2023.
  59. J. Zimmert and Y. Seldin. Tsallis-inf: An optimal algorithm for stochastic and adversarial bandits. In Proc. of International Conference on Artificial Intelligence and Statistics (AISTATS), pages 467–475, 2021.
  60. Beating stochastic and adversarial semi-bandits optimally and simultaneously. In Proc. of International Conference on Machine Learning (ICML), volume 97 of Proceedings of Machine Learning Research, pages 7683–7692, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Yuko Kuroki (15 papers)
  2. Alberto Rumi (3 papers)
  3. Taira Tsuchiya (19 papers)
  4. Fabio Vitale (20 papers)
  5. Nicolò Cesa-Bianchi (83 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com