Papers
Topics
Authors
Recent
2000 character limit reached

Directional Optimism for Safe Linear Bandits (2308.15006v2)

Published 29 Aug 2023 in cs.LG

Abstract: The safe linear bandit problem is a version of the classical stochastic linear bandit problem where the learner's actions must satisfy an uncertain constraint at all rounds. Due its applicability to many real-world settings, this problem has received considerable attention in recent years. By leveraging a novel approach that we call directional optimism, we find that it is possible to achieve improved regret guarantees for both well-separated problem instances and action sets that are finite star convex sets. Furthermore, we propose a novel algorithm for this setting that improves on existing algorithms in terms of empirical performance, while enjoying matching regret guarantees. Lastly, we introduce a generalization of the safe linear bandit setting where the constraints are convex and adapt our algorithms and analyses to this setting by leveraging a novel convex-analysis based approach.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Improved algorithms for linear stochastic bandits. Advances in neural information processing systems, 24, 2011.
  2. Constrained policy optimization. In International conference on machine learning, pages 22–31. PMLR, 2017.
  3. Linear contextual bandits with knapsacks. Advances in Neural Information Processing Systems, 29, 2016.
  4. An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives. In Conference on Learning Theory, pages 4–18. PMLR, 2016.
  5. Decentralized multi-agent linear bandits with safety constraints. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 6627–6635, 2021.
  6. Linear stochastic bandits under safety constraints. Advances in Neural Information Processing Systems, 32, 2019.
  7. Safe reinforcement learning with linear function approximation. In International Conference on Machine Learning, pages 243–253. PMLR, 2021.
  8. Peter Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3(Nov):397–422, 2002.
  9. Bandits with knapsacks. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pages 207–216. IEEE, 2013.
  10. Resourceful contextual bandits. In Conference on Learning Theory, pages 1109–1134. PMLR, 2014.
  11. Dope: Doubly optimistic and pessimistic exploration for safe reinforcement learning. Advances in Neural Information Processing Systems, 35:1047–1059, 2022.
  12. Active learning with safety constraints. Advances in Neural Information Processing Systems, 35:33201–33214, 2022.
  13. Budget-constrained bandits over general cost and reward distributions. In International Conference on Artificial Intelligence and Statistics, pages 4388–4398. PMLR, 2020.
  14. A doubly optimistic strategy for safe linear bandits. arXiv preprint arXiv:2209.13694, 2022a.
  15. Strategies for safe multi-armed bandits with logarithmic regret and risk. In International Conference on Machine Learning, pages 3123–3148. PMLR, 2022b.
  16. Contextual bandits with linear payoff functions. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pages 208–214. JMLR Workshop and Conference Proceedings, 2011.
  17. Bandits with budgets: Regret lower bounds and optimal algorithms. ACM SIGMETRICS Performance Evaluation Review, 43(1):245–257, 2015.
  18. Stochastic linear optimization under bandit feedback. 2008.
  19. Safe learning under uncertain objectives and constraints. arXiv preprint arXiv:2006.13326, 2020.
  20. Online convex optimization in the bandit setting: gradient descent without a gradient. In Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, pages 385–394, 2005.
  21. Group meritocratic fairness in linear contextual bandits. Advances in Neural Information Processing Systems, 35:24392–24404, 2022.
  22. Meritocratic fairness for infinite and contextual bandits. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pages 158–163, 2018.
  23. Conservative contextual linear bandits. Advances in Neural Information Processing Systems, 30, 2017.
  24. Safe linear stochastic bandits. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 10202–10209, 2020.
  25. Spectral bandits. The Journal of Machine Learning Research, 21(1):9003–9046, 2020.
  26. Bandit algorithms. Cambridge University Press, 2020.
  27. Learning with good feature representations in bandits and in rl with a generative model. In International Conference on Machine Learning, pages 5662–5670. PMLR, 2020.
  28. Interactively learning preference constraints in linear bandits. In International Conference on Machine Learning, pages 13505–13527. PMLR, 2022.
  29. Learning safety constraints from demonstrations with unknown rewards. arXiv preprint arXiv:2305.16147, 2023.
  30. Learning policies with zero or bounded constraint violation for constrained mdps. Advances in Neural Information Processing Systems, 34:17183–17193, 2021a.
  31. An efficient pessimistic-optimistic algorithm for stochastic linear bandits with general constraints. Advances in Neural Information Processing Systems, 34:24075–24086, 2021b.
  32. Stage-wise conservative linear bandits. Advances in neural information processing systems, 33:11191–11201, 2020.
  33. Safe linear thompson sampling with side information. IEEE Transactions on Signal Processing, 69:3755–3767, 2021.
  34. Stochastic bandits with linear constraints. In International conference on artificial intelligence and statistics, pages 2827–2835. PMLR, 2021.
  35. Linearly parameterized bandits. Mathematics of Operations Research, 35(2):395–411, 2010.
  36. Safe exploration for optimization with gaussian processes. In International conference on machine learning, pages 997–1005. PMLR, 2015.
  37. Stagewise safe bayesian optimization with gaussian processes. In International conference on machine learning, pages 4781–4789. PMLR, 2018.
  38. Safe convex learning under uncertain constraints. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 2106–2114. PMLR, 2019.
  39. Spectral bandits for smooth graph functions. In International Conference on Machine Learning, pages 46–54. PMLR, 2014.
  40. Stochastic linear bandits with unknown safety constraints and local feedback. In ICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems, 2023.
  41. Safe reinforcement learning in constrained markov decision processes. In International Conference on Machine Learning, pages 9797–9806. PMLR, 2020.
  42. Safe exploration for efficient policy evaluation and comparison. In International Conference on Machine Learning, pages 22491–22511. PMLR, 2022.
  43. Best arm identification with safety constraints. In International Conference on Artificial Intelligence and Statistics, pages 9114–9146. PMLR, 2022.
  44. Algorithms with logarithmic or sublinear regret for constrained contextual bandits. Advances in Neural Information Processing Systems, 28, 2015.
  45. Conservative bandits. In International Conference on Machine Learning, pages 1254–1262. PMLR, 2016.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.