Approximately Stationary Bandits with Knapsacks (2302.14686v2)
Abstract: Bandits with Knapsacks (BwK), the generalization of the Bandits problem under global budget constraints, has received a lot of attention in recent years. Previous work has focused on one of the two extremes: Stochastic BwK where the rewards and consumptions of the resources of each round are sampled from an i.i.d. distribution, and Adversarial BwK where these parameters are picked by an adversary. Achievable guarantees in the two cases exhibit a massive gap: No-regret learning is achievable in the stochastic case, but in the adversarial case only competitive ratio style guarantees are achievable, where the competitive ratio depends either on the budget or on both the time and the number of resources. What makes this gap so vast is that in Adversarial BwK the guarantees get worse in the typical case when the budget is more binding. While ``best-of-both-worlds'' type algorithms are known (single algorithms that provide the best achievable guarantee in each extreme case), their bounds degrade to the adversarial case as soon as the environment is not fully stochastic. Our work aims to bridge this gap, offering guarantees for a workload that is not exactly stochastic but is also not worst-case. We define a condition, Approximately Stationary BwK, that parameterizes how close to stochastic or adversarial an instance is. Based on these parameters, we explore what is the best competitive ratio attainable in BwK. We explore two algorithms that are oblivious to the values of the parameters but guarantee competitive ratios that smoothly transition between the best possible guarantees in the two extreme cases, depending on the values of the parameters. Our guarantees offer great improvement over the adversarial guarantee, especially when the available budget is small. We also prove bounds on the achievable guarantee, showing that our results are approximately tight when the budget is small.
- Shipra Agrawal and Nikhil R. Devanur “Bandits with concave rewards and convex knapsacks” In ACM Conference on Economics and Computation, EC ’14, Stanford , CA, USA, June 8-12, 2014 ACM, 2014, pp. 989–1006 DOI: 10.1145/2600057.2602844
- “The Nonstochastic Multiarmed Bandit Problem” In SIAM J. Comput. 32.1, 2002, pp. 48–77 DOI: 10.1137/S0097539701398375
- Ashwinkumar Badanidiyuru, Robert Kleinberg and Aleksandrs Slivkins “Bandits with Knapsacks” In 54th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2013, 26-29 October, 2013, Berkeley, CA, USA IEEE Computer Society, 2013, pp. 207–216 DOI: 10.1109/FOCS.2013.30
- Ashwinkumar Badanidiyuru, John Langford and Aleksandrs Slivkins “Resourceful Contextual Bandits” In Proceedings of The 27th Conference on Learning Theory, COLT 2014, Barcelona, Spain, June 13-15, 2014 35, JMLR Workshop and Conference Proceedings JMLR.org, 2014, pp. 1109–1134 URL: http://proceedings.mlr.press/v35/badanidiyuru14.html
- Santiago R. Balseiro and Yonatan Gur “Learning in Repeated Auctions with Budgets: Regret Minimization and Equilibrium” In Proceedings of the 2017 ACM Conference on Economics and Computation, EC ’17, Cambridge, MA, USA, June 26-30, 2017 ACM, 2017, pp. 609 DOI: 10.1145/3033274.3084088
- Santiago R. Balseiro, Haihao Lu and Vahab S. Mirrokni “Dual Mirror Descent for Online Allocation Problems” In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event 119, Proceedings of Machine Learning Research PMLR, 2020, pp. 613–628 URL: http://proceedings.mlr.press/v119/balseiro20a.html
- Omar Besbes, Yonatan Gur and Assaf Zeevi “Non-Stationary Stochastic Optimization” In Oper. Res. 63.5, 2015, pp. 1227–1244 DOI: 10.1287/opre.2015.1408
- “The Best of Both Worlds: Stochastic and Adversarial Bandits” In COLT 2012 - The 25th Annual Conference on Learning Theory, June 25-27, 2012, Edinburgh, Scotland 23, JMLR Proceedings JMLR.org, 2012, pp. 42.1–42.23 URL: http://proceedings.mlr.press/v23/bubeck12b/bubeck12b.pdf
- Matteo Castiglioni, Andrea Celli and Christian Kroer “Online Learning with Knapsacks: the Best of Both Worlds” In International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA 162, Proceedings of Machine Learning Research PMLR, 2022, pp. 2767–2783 URL: https://proceedings.mlr.press/v162/castiglioni22a.html
- “Near Optimal Online Algorithms and Fast Approximation Algorithms for Resource Allocation Problems” In J. ACM 66.1, 2019, pp. 7:1–7:41 DOI: 10.1145/3284177
- Yoav Freund and Robert E. Schapire “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting” In J. Comput. Syst. Sci. 55.1, 1997, pp. 119–139 DOI: 10.1006/jcss.1997.1504
- Yonatan Gur, Assaf Zeevi and Omar Besbes “Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards” In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, 2014, pp. 199–207 URL: https://proceedings.neurips.cc/paper/2014/hash/903ce9225fca3e988c2af215d4e544d3-Abstract.html
- Elad Hazan “Introduction to Online Convex Optimization” In Found. Trends Optim. 2.3-4, 2016, pp. 157–325 DOI: 10.1561/2400000013
- “Adversarial Bandits with Knapsacks” In 60th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2019, Baltimore, Maryland, USA, November 9-12, 2019 IEEE Computer Society, 2019, pp. 202–219 DOI: 10.1109/FOCS.2019.00022
- “Online Learning with Vector Costs and Bandits with Knapsacks” In Conference on Learning Theory, COLT 2020, 9-12 July 2020, Virtual Event [Graz, Austria] 125, Proceedings of Machine Learning Research PMLR, 2020, pp. 2286–2305 URL: http://proceedings.mlr.press/v125/kesselheim20a.html
- “Non-monotonic Resource Utilization in the Bandits with Knapsacks Problem” In NeurIPS, 2022 URL: http://papers.nips.cc/paper%5C_files/paper/2022/hash/7a62d9a4c03377d1175b8859b4cc16d4-Abstract-Conference.html
- Shang Liu, Jiashuo Jiang and Xiaocheng Li “Non-stationary Bandits with Knapsacks” In NeurIPS, 2022 URL: http://papers.nips.cc/paper%5C_files/paper/2022/hash/69469da823348084ca8933368ecbf676-Abstract-Conference.html
- Thodoris Lykouris, Vahab S. Mirrokni and Renato Paes Leme “Stochastic bandits robust to adversarial corruptions” In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018, Los Angeles, CA, USA, June 25-29, 2018 ACM, 2018, pp. 114–122 DOI: 10.1145/3188745.3188918
- Anshuka Rangi, Massimo Franceschetti and Long Tran-Thanh “Unifying the Stochastic and the Adversarial Bandits with Knapsack” In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019 ijcai.org, 2019, pp. 3311–3317 DOI: 10.24963/ijcai.2019/459
- Karthik Abinav Sankararaman and Aleksandrs Slivkins “Combinatorial Semi-Bandits with Knapsacks” In International Conference on Artificial Intelligence and Statistics, AISTATS 2018, 9-11 April 2018, Playa Blanca, Lanzarote, Canary Islands, Spain 84, Proceedings of Machine Learning Research PMLR, 2018, pp. 1760–1770 URL: http://proceedings.mlr.press/v84/sankararaman18a.html
- Aleksandrs Slivkins “Book announcement: Introduction to Multi-Armed Bandits” In SIGecom Exch. 18.1, 2020, pp. 28–30 DOI: 10.1145/3440959.3440965
- Aleksandrs Slivkins, Karthik Abinav Sankararaman and Dylan J. Foster “Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression”, 2023 arXiv:2211.07484 [cs.LG]
- “Online Learning of Rested and Restless Bandits” In IEEE Trans. Inf. Theory 58.8, 2012, pp. 5588–5611 DOI: 10.1109/TIT.2012.2198613
- Siwei Wang, Longbo Huang and John C.S. Lui “Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits” In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020 URL: https://proceedings.neurips.cc/paper/2020/hash/89ae0fe22c47d374bc9350ef99e01685-Abstract.html