Directional Optimism for Safe Linear Bandits (2308.15006v2)
Abstract: The safe linear bandit problem is a version of the classical stochastic linear bandit problem where the learner's actions must satisfy an uncertain constraint at all rounds. Due its applicability to many real-world settings, this problem has received considerable attention in recent years. By leveraging a novel approach that we call directional optimism, we find that it is possible to achieve improved regret guarantees for both well-separated problem instances and action sets that are finite star convex sets. Furthermore, we propose a novel algorithm for this setting that improves on existing algorithms in terms of empirical performance, while enjoying matching regret guarantees. Lastly, we introduce a generalization of the safe linear bandit setting where the constraints are convex and adapt our algorithms and analyses to this setting by leveraging a novel convex-analysis based approach.
- Improved algorithms for linear stochastic bandits. Advances in neural information processing systems, 24, 2011.
- Constrained policy optimization. In International conference on machine learning, pages 22–31. PMLR, 2017.
- Linear contextual bandits with knapsacks. Advances in Neural Information Processing Systems, 29, 2016.
- An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives. In Conference on Learning Theory, pages 4–18. PMLR, 2016.
- Decentralized multi-agent linear bandits with safety constraints. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 6627–6635, 2021.
- Linear stochastic bandits under safety constraints. Advances in Neural Information Processing Systems, 32, 2019.
- Safe reinforcement learning with linear function approximation. In International Conference on Machine Learning, pages 243–253. PMLR, 2021.
- Peter Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3(Nov):397–422, 2002.
- Bandits with knapsacks. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pages 207–216. IEEE, 2013.
- Resourceful contextual bandits. In Conference on Learning Theory, pages 1109–1134. PMLR, 2014.
- Dope: Doubly optimistic and pessimistic exploration for safe reinforcement learning. Advances in Neural Information Processing Systems, 35:1047–1059, 2022.
- Active learning with safety constraints. Advances in Neural Information Processing Systems, 35:33201–33214, 2022.
- Budget-constrained bandits over general cost and reward distributions. In International Conference on Artificial Intelligence and Statistics, pages 4388–4398. PMLR, 2020.
- A doubly optimistic strategy for safe linear bandits. arXiv preprint arXiv:2209.13694, 2022a.
- Strategies for safe multi-armed bandits with logarithmic regret and risk. In International Conference on Machine Learning, pages 3123–3148. PMLR, 2022b.
- Contextual bandits with linear payoff functions. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pages 208–214. JMLR Workshop and Conference Proceedings, 2011.
- Bandits with budgets: Regret lower bounds and optimal algorithms. ACM SIGMETRICS Performance Evaluation Review, 43(1):245–257, 2015.
- Stochastic linear optimization under bandit feedback. 2008.
- Safe learning under uncertain objectives and constraints. arXiv preprint arXiv:2006.13326, 2020.
- Online convex optimization in the bandit setting: gradient descent without a gradient. In Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, pages 385–394, 2005.
- Group meritocratic fairness in linear contextual bandits. Advances in Neural Information Processing Systems, 35:24392–24404, 2022.
- Meritocratic fairness for infinite and contextual bandits. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pages 158–163, 2018.
- Conservative contextual linear bandits. Advances in Neural Information Processing Systems, 30, 2017.
- Safe linear stochastic bandits. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 10202–10209, 2020.
- Spectral bandits. The Journal of Machine Learning Research, 21(1):9003–9046, 2020.
- Bandit algorithms. Cambridge University Press, 2020.
- Learning with good feature representations in bandits and in rl with a generative model. In International Conference on Machine Learning, pages 5662–5670. PMLR, 2020.
- Interactively learning preference constraints in linear bandits. In International Conference on Machine Learning, pages 13505–13527. PMLR, 2022.
- Learning safety constraints from demonstrations with unknown rewards. arXiv preprint arXiv:2305.16147, 2023.
- Learning policies with zero or bounded constraint violation for constrained mdps. Advances in Neural Information Processing Systems, 34:17183–17193, 2021a.
- An efficient pessimistic-optimistic algorithm for stochastic linear bandits with general constraints. Advances in Neural Information Processing Systems, 34:24075–24086, 2021b.
- Stage-wise conservative linear bandits. Advances in neural information processing systems, 33:11191–11201, 2020.
- Safe linear thompson sampling with side information. IEEE Transactions on Signal Processing, 69:3755–3767, 2021.
- Stochastic bandits with linear constraints. In International conference on artificial intelligence and statistics, pages 2827–2835. PMLR, 2021.
- Linearly parameterized bandits. Mathematics of Operations Research, 35(2):395–411, 2010.
- Safe exploration for optimization with gaussian processes. In International conference on machine learning, pages 997–1005. PMLR, 2015.
- Stagewise safe bayesian optimization with gaussian processes. In International conference on machine learning, pages 4781–4789. PMLR, 2018.
- Safe convex learning under uncertain constraints. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 2106–2114. PMLR, 2019.
- Spectral bandits for smooth graph functions. In International Conference on Machine Learning, pages 46–54. PMLR, 2014.
- Stochastic linear bandits with unknown safety constraints and local feedback. In ICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems, 2023.
- Safe reinforcement learning in constrained markov decision processes. In International Conference on Machine Learning, pages 9797–9806. PMLR, 2020.
- Safe exploration for efficient policy evaluation and comparison. In International Conference on Machine Learning, pages 22491–22511. PMLR, 2022.
- Best arm identification with safety constraints. In International Conference on Artificial Intelligence and Statistics, pages 9114–9146. PMLR, 2022.
- Algorithms with logarithmic or sublinear regret for constrained contextual bandits. Advances in Neural Information Processing Systems, 28, 2015.
- Conservative bandits. In International Conference on Machine Learning, pages 1254–1262. PMLR, 2016.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.