Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bandits with Replenishable Knapsacks: the Best of both Worlds (2306.08470v1)

Published 14 Jun 2023 in cs.LG and stat.ML

Abstract: The bandits with knapsack (BwK) framework models online decision-making problems in which an agent makes a sequence of decisions subject to resource consumption constraints. The traditional model assumes that each action consumes a non-negative amount of resources and the process ends when the initial budgets are fully depleted. We study a natural generalization of the BwK framework which allows non-monotonic resource utilization, i.e., resources can be replenished by a positive amount. We propose a best-of-both-worlds primal-dual template that can handle any online learning problem with replenishment for which a suitable primal regret minimizer exists. In particular, we provide the first positive results for the case of adversarial inputs by showing that our framework guarantees a constant competitive ratio $\alpha$ when $B=\Omega(T)$ or when the possible per-round replenishment is a positive constant. Moreover, under a stochastic input model, our algorithm yields an instance-independent $\tilde{O}(T{1/2})$ regret bound which complements existing instance-dependent bounds for the same setting. Finally, we provide applications of our framework to some economic problems of practical relevance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Bandits with global convex constraints and objective. Operations Research, 67(5):1486–1502, 2019.
  2. An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives. In 29th Annual Conference on Learning Theory (COLT), 2016.
  3. Dynamic pricing with limited supply. In Proceedings of the 13th ACM Conference on Electronic Commerce, pages 74–91, 2012.
  4. Learning on a budget: posted price mechanisms for online procurement. In Proceedings of the 13th ACM conference on electronic commerce, pages 128–145, 2012.
  5. Bandits with knapsacks. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, FOCS 2013, pages 207–216. IEEE, 2013.
  6. Resourceful contextual bandits. In Conference on Learning Theory, pages 1109–1134. PMLR, 2014.
  7. Bandits with knapsacks. J. ACM, 65(3), 2018.
  8. The best of many worlds: Dual mirror descent for online allocation problems. Operations Research, 2022.
  9. Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Operations Research, 57(6):1407–1420, 2009.
  10. Tracking a small set of experts by mixing past posteriors. Journal of Machine Learning Research, 3(Nov):363–396, 2002.
  11. Online learning with knapsacks: the best of both worlds. In International Conference on Machine Learning, pages 2767–2783. PMLR, 2022a.
  12. A unifying framework for online optimization with long-term constraints. In Advances in Neural Information Processing Systems, volume 35, pages 33589–33602, 2022b.
  13. Online learning under budget and ROI constraints and applications to bidding in non-truthful auctions. arXiv preprint arXiv:2302.01203, 2023.
  14. Best of many worlds guarantees for online learning with knapsacks. arXiv preprint arXiv:2202.13710, 2023.
  15. Mirror descent meets fixed share (and feels no regret). In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, page 980–988, 2012.
  16. Bilateral trade: A regret minimization perspective. Mathematics of Operations Research, 2023.
  17. Dynamic pricing and inventory control with fixed ordering cost and incomplete demand information. Management Science, 68(8):5684–5703, 2022.
  18. Coordinating inventory control and pricing strategies with random demand and fixed ordering cost: The finite horizon case. Operations research, 52(6):887–896, 2004.
  19. Bandits with budgets: Regret lower bounds and optimal algorithms. ACM SIGMETRICS Performance Evaluation Review, 43(1):245–257, 2015.
  20. Approximately stationary bandits with knapsacks. arXiv preprint arXiv:2302.14686, 2023.
  21. Adaptive algorithms for online decision problems. In Electronic colloquium on computational complexity (ECCC), volume 14, 2007.
  22. Elad Hazan et al. Introduction to online convex optimization, volume 2. Now Publishers, Inc., 2016.
  23. Tracking the best expert. Machine learning, 32(2):151–178, 1998.
  24. Adversarial bandits with knapsacks. J. ACM, 69(6), 2022. ISSN 0004-5411.
  25. Non-monotonic resource utilization in the bandits with knapsacks problem. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
  26. The symmetry between arms and knapsacks: A primal-dual approach for bandits with knapsacks. In ICML, volume 139 of Proceedings of Machine Learning Research, pages 6483–6492. PMLR, 2021.
  27. Efficient mechanisms for bilateral trading. Journal of economic theory, 29(2):265–281, 1983.
  28. Gergely Neu. Explore no more: Improved high-probability regret bounds for non-stochastic bandits. Advances in Neural Information Processing Systems, 28, 2015.
  29. Combinatorial semi-bandits with knapsacks. In International Conference on Artificial Intelligence and Statistics, pages 1760–1770. PMLR, 2018.
  30. William Vickrey. Counterspeculation, auctions, and competitive sealed tenders. The Journal of finance, 16(1):8–37, 1961.
  31. Close the gaps: A learning-while-doing algorithm for single-product revenue management problems. Operations Research, 62(2):318–331, 2014.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Martino Bernasconi (19 papers)
  2. Matteo Castiglioni (60 papers)
  3. Andrea Celli (39 papers)
  4. Federico Fusco (29 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.