Bandits with Replenishable Knapsacks: the Best of both Worlds (2306.08470v1)
Abstract: The bandits with knapsack (BwK) framework models online decision-making problems in which an agent makes a sequence of decisions subject to resource consumption constraints. The traditional model assumes that each action consumes a non-negative amount of resources and the process ends when the initial budgets are fully depleted. We study a natural generalization of the BwK framework which allows non-monotonic resource utilization, i.e., resources can be replenished by a positive amount. We propose a best-of-both-worlds primal-dual template that can handle any online learning problem with replenishment for which a suitable primal regret minimizer exists. In particular, we provide the first positive results for the case of adversarial inputs by showing that our framework guarantees a constant competitive ratio $\alpha$ when $B=\Omega(T)$ or when the possible per-round replenishment is a positive constant. Moreover, under a stochastic input model, our algorithm yields an instance-independent $\tilde{O}(T{1/2})$ regret bound which complements existing instance-dependent bounds for the same setting. Finally, we provide applications of our framework to some economic problems of practical relevance.
- Bandits with global convex constraints and objective. Operations Research, 67(5):1486–1502, 2019.
- An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives. In 29th Annual Conference on Learning Theory (COLT), 2016.
- Dynamic pricing with limited supply. In Proceedings of the 13th ACM Conference on Electronic Commerce, pages 74–91, 2012.
- Learning on a budget: posted price mechanisms for online procurement. In Proceedings of the 13th ACM conference on electronic commerce, pages 128–145, 2012.
- Bandits with knapsacks. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, FOCS 2013, pages 207–216. IEEE, 2013.
- Resourceful contextual bandits. In Conference on Learning Theory, pages 1109–1134. PMLR, 2014.
- Bandits with knapsacks. J. ACM, 65(3), 2018.
- The best of many worlds: Dual mirror descent for online allocation problems. Operations Research, 2022.
- Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Operations Research, 57(6):1407–1420, 2009.
- Tracking a small set of experts by mixing past posteriors. Journal of Machine Learning Research, 3(Nov):363–396, 2002.
- Online learning with knapsacks: the best of both worlds. In International Conference on Machine Learning, pages 2767–2783. PMLR, 2022a.
- A unifying framework for online optimization with long-term constraints. In Advances in Neural Information Processing Systems, volume 35, pages 33589–33602, 2022b.
- Online learning under budget and ROI constraints and applications to bidding in non-truthful auctions. arXiv preprint arXiv:2302.01203, 2023.
- Best of many worlds guarantees for online learning with knapsacks. arXiv preprint arXiv:2202.13710, 2023.
- Mirror descent meets fixed share (and feels no regret). In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, page 980–988, 2012.
- Bilateral trade: A regret minimization perspective. Mathematics of Operations Research, 2023.
- Dynamic pricing and inventory control with fixed ordering cost and incomplete demand information. Management Science, 68(8):5684–5703, 2022.
- Coordinating inventory control and pricing strategies with random demand and fixed ordering cost: The finite horizon case. Operations research, 52(6):887–896, 2004.
- Bandits with budgets: Regret lower bounds and optimal algorithms. ACM SIGMETRICS Performance Evaluation Review, 43(1):245–257, 2015.
- Approximately stationary bandits with knapsacks. arXiv preprint arXiv:2302.14686, 2023.
- Adaptive algorithms for online decision problems. In Electronic colloquium on computational complexity (ECCC), volume 14, 2007.
- Elad Hazan et al. Introduction to online convex optimization, volume 2. Now Publishers, Inc., 2016.
- Tracking the best expert. Machine learning, 32(2):151–178, 1998.
- Adversarial bandits with knapsacks. J. ACM, 69(6), 2022. ISSN 0004-5411.
- Non-monotonic resource utilization in the bandits with knapsacks problem. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
- The symmetry between arms and knapsacks: A primal-dual approach for bandits with knapsacks. In ICML, volume 139 of Proceedings of Machine Learning Research, pages 6483–6492. PMLR, 2021.
- Efficient mechanisms for bilateral trading. Journal of economic theory, 29(2):265–281, 1983.
- Gergely Neu. Explore no more: Improved high-probability regret bounds for non-stochastic bandits. Advances in Neural Information Processing Systems, 28, 2015.
- Combinatorial semi-bandits with knapsacks. In International Conference on Artificial Intelligence and Statistics, pages 1760–1770. PMLR, 2018.
- William Vickrey. Counterspeculation, auctions, and competitive sealed tenders. The Journal of finance, 16(1):8–37, 1961.
- Close the gaps: A learning-while-doing algorithm for single-product revenue management problems. Operations Research, 62(2):318–331, 2014.
- Martino Bernasconi (19 papers)
- Matteo Castiglioni (60 papers)
- Andrea Celli (39 papers)
- Federico Fusco (29 papers)