Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Approximately Stationary Bandits with Knapsacks (2302.14686v2)

Published 28 Feb 2023 in cs.LG and stat.ML

Abstract: Bandits with Knapsacks (BwK), the generalization of the Bandits problem under global budget constraints, has received a lot of attention in recent years. Previous work has focused on one of the two extremes: Stochastic BwK where the rewards and consumptions of the resources of each round are sampled from an i.i.d. distribution, and Adversarial BwK where these parameters are picked by an adversary. Achievable guarantees in the two cases exhibit a massive gap: No-regret learning is achievable in the stochastic case, but in the adversarial case only competitive ratio style guarantees are achievable, where the competitive ratio depends either on the budget or on both the time and the number of resources. What makes this gap so vast is that in Adversarial BwK the guarantees get worse in the typical case when the budget is more binding. While ``best-of-both-worlds'' type algorithms are known (single algorithms that provide the best achievable guarantee in each extreme case), their bounds degrade to the adversarial case as soon as the environment is not fully stochastic. Our work aims to bridge this gap, offering guarantees for a workload that is not exactly stochastic but is also not worst-case. We define a condition, Approximately Stationary BwK, that parameterizes how close to stochastic or adversarial an instance is. Based on these parameters, we explore what is the best competitive ratio attainable in BwK. We explore two algorithms that are oblivious to the values of the parameters but guarantee competitive ratios that smoothly transition between the best possible guarantees in the two extreme cases, depending on the values of the parameters. Our guarantees offer great improvement over the adversarial guarantee, especially when the available budget is small. We also prove bounds on the achievable guarantee, showing that our results are approximately tight when the budget is small.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. Shipra Agrawal and Nikhil R. Devanur “Bandits with concave rewards and convex knapsacks” In ACM Conference on Economics and Computation, EC ’14, Stanford , CA, USA, June 8-12, 2014 ACM, 2014, pp. 989–1006 DOI: 10.1145/2600057.2602844
  2. “The Nonstochastic Multiarmed Bandit Problem” In SIAM J. Comput. 32.1, 2002, pp. 48–77 DOI: 10.1137/S0097539701398375
  3. Ashwinkumar Badanidiyuru, Robert Kleinberg and Aleksandrs Slivkins “Bandits with Knapsacks” In 54th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2013, 26-29 October, 2013, Berkeley, CA, USA IEEE Computer Society, 2013, pp. 207–216 DOI: 10.1109/FOCS.2013.30
  4. Ashwinkumar Badanidiyuru, John Langford and Aleksandrs Slivkins “Resourceful Contextual Bandits” In Proceedings of The 27th Conference on Learning Theory, COLT 2014, Barcelona, Spain, June 13-15, 2014 35, JMLR Workshop and Conference Proceedings JMLR.org, 2014, pp. 1109–1134 URL: http://proceedings.mlr.press/v35/badanidiyuru14.html
  5. Santiago R. Balseiro and Yonatan Gur “Learning in Repeated Auctions with Budgets: Regret Minimization and Equilibrium” In Proceedings of the 2017 ACM Conference on Economics and Computation, EC ’17, Cambridge, MA, USA, June 26-30, 2017 ACM, 2017, pp. 609 DOI: 10.1145/3033274.3084088
  6. Santiago R. Balseiro, Haihao Lu and Vahab S. Mirrokni “Dual Mirror Descent for Online Allocation Problems” In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event 119, Proceedings of Machine Learning Research PMLR, 2020, pp. 613–628 URL: http://proceedings.mlr.press/v119/balseiro20a.html
  7. Omar Besbes, Yonatan Gur and Assaf Zeevi “Non-Stationary Stochastic Optimization” In Oper. Res. 63.5, 2015, pp. 1227–1244 DOI: 10.1287/opre.2015.1408
  8. “The Best of Both Worlds: Stochastic and Adversarial Bandits” In COLT 2012 - The 25th Annual Conference on Learning Theory, June 25-27, 2012, Edinburgh, Scotland 23, JMLR Proceedings JMLR.org, 2012, pp. 42.1–42.23 URL: http://proceedings.mlr.press/v23/bubeck12b/bubeck12b.pdf
  9. Matteo Castiglioni, Andrea Celli and Christian Kroer “Online Learning with Knapsacks: the Best of Both Worlds” In International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA 162, Proceedings of Machine Learning Research PMLR, 2022, pp. 2767–2783 URL: https://proceedings.mlr.press/v162/castiglioni22a.html
  10. “Near Optimal Online Algorithms and Fast Approximation Algorithms for Resource Allocation Problems” In J. ACM 66.1, 2019, pp. 7:1–7:41 DOI: 10.1145/3284177
  11. Yoav Freund and Robert E. Schapire “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting” In J. Comput. Syst. Sci. 55.1, 1997, pp. 119–139 DOI: 10.1006/jcss.1997.1504
  12. Yonatan Gur, Assaf Zeevi and Omar Besbes “Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards” In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, 2014, pp. 199–207 URL: https://proceedings.neurips.cc/paper/2014/hash/903ce9225fca3e988c2af215d4e544d3-Abstract.html
  13. Elad Hazan “Introduction to Online Convex Optimization” In Found. Trends Optim. 2.3-4, 2016, pp. 157–325 DOI: 10.1561/2400000013
  14. “Adversarial Bandits with Knapsacks” In 60th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2019, Baltimore, Maryland, USA, November 9-12, 2019 IEEE Computer Society, 2019, pp. 202–219 DOI: 10.1109/FOCS.2019.00022
  15. “Online Learning with Vector Costs and Bandits with Knapsacks” In Conference on Learning Theory, COLT 2020, 9-12 July 2020, Virtual Event [Graz, Austria] 125, Proceedings of Machine Learning Research PMLR, 2020, pp. 2286–2305 URL: http://proceedings.mlr.press/v125/kesselheim20a.html
  16. “Non-monotonic Resource Utilization in the Bandits with Knapsacks Problem” In NeurIPS, 2022 URL: http://papers.nips.cc/paper%5C_files/paper/2022/hash/7a62d9a4c03377d1175b8859b4cc16d4-Abstract-Conference.html
  17. Shang Liu, Jiashuo Jiang and Xiaocheng Li “Non-stationary Bandits with Knapsacks” In NeurIPS, 2022 URL: http://papers.nips.cc/paper%5C_files/paper/2022/hash/69469da823348084ca8933368ecbf676-Abstract-Conference.html
  18. Thodoris Lykouris, Vahab S. Mirrokni and Renato Paes Leme “Stochastic bandits robust to adversarial corruptions” In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018, Los Angeles, CA, USA, June 25-29, 2018 ACM, 2018, pp. 114–122 DOI: 10.1145/3188745.3188918
  19. Anshuka Rangi, Massimo Franceschetti and Long Tran-Thanh “Unifying the Stochastic and the Adversarial Bandits with Knapsack” In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019 ijcai.org, 2019, pp. 3311–3317 DOI: 10.24963/ijcai.2019/459
  20. Karthik Abinav Sankararaman and Aleksandrs Slivkins “Combinatorial Semi-Bandits with Knapsacks” In International Conference on Artificial Intelligence and Statistics, AISTATS 2018, 9-11 April 2018, Playa Blanca, Lanzarote, Canary Islands, Spain 84, Proceedings of Machine Learning Research PMLR, 2018, pp. 1760–1770 URL: http://proceedings.mlr.press/v84/sankararaman18a.html
  21. Aleksandrs Slivkins “Book announcement: Introduction to Multi-Armed Bandits” In SIGecom Exch. 18.1, 2020, pp. 28–30 DOI: 10.1145/3440959.3440965
  22. Aleksandrs Slivkins, Karthik Abinav Sankararaman and Dylan J. Foster “Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression”, 2023 arXiv:2211.07484 [cs.LG]
  23. “Online Learning of Rested and Restless Bandits” In IEEE Trans. Inf. Theory 58.8, 2012, pp. 5588–5611 DOI: 10.1109/TIT.2012.2198613
  24. Siwei Wang, Longbo Huang and John C.S. Lui “Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits” In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020 URL: https://proceedings.neurips.cc/paper/2020/hash/89ae0fe22c47d374bc9350ef99e01685-Abstract.html
Citations (6)

Summary

We haven't generated a summary for this paper yet.