Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
12 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
37 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

Beyond Primal-Dual Methods in Bandits with Stochastic and Adversarial Constraints (2405.16118v1)

Published 25 May 2024 in cs.LG

Abstract: We address a generalization of the bandit with knapsacks problem, where a learner aims to maximize rewards while satisfying an arbitrary set of long-term constraints. Our goal is to design best-of-both-worlds algorithms that perform optimally under both stochastic and adversarial constraints. Previous works address this problem via primal-dual methods, and require some stringent assumptions, namely the Slater's condition, and in adversarial settings, they either assume knowledge of a lower bound on the Slater's parameter, or impose strong requirements on the primal and dual regret minimizers such as requiring weak adaptivity. We propose an alternative and more natural approach based on optimistic estimations of the constraints. Surprisingly, we show that estimating the constraints with an UCB-like approach guarantees optimal performances. Our algorithm consists of two main components: (i) a regret minimizer working on \emph{moving strategy sets} and (ii) an estimate of the feasible set as an optimistic weighted empirical mean of previous samples. The key challenge in this approach is designing adaptive weights that meet the different requirements for stochastic and adversarial constraints. Our algorithm is significantly simpler than previous approaches, and has a cleaner analysis. Moreover, ours is the first best-of-both-worlds algorithm providing bounds logarithmic in the number of constraints. Additionally, in stochastic settings, it provides $\widetilde O(\sqrt{T})$ regret \emph{without} Slater's condition.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. No-regret algorithms in non-truthful auctions with budget and ROI constraints. arXiv preprint, abs/2404.09832, 2024.
  2. Bandits with concave rewards and convex knapsacks. In EC, pages 989–1006. ACM, 2014.
  3. Bandits with global convex constraints and objective. Operations Research, 67(5):1486–1502, 2019.
  4. An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits. In COLT, volume 49 of JMLR Workshop and Conference Proceedings, pages 116–120. JMLR.org, 2016.
  5. The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1):48–77, 2002.
  6. Bandits with knapsacks. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, FOCS 2013, pages 207–216. IEEE, 2013.
  7. Bandits with knapsacks. J. ACM, 65(3), 2018.
  8. Online resource allocation under horizon uncertainty. SIGMETRICS Perform. Eval. Rev., 51(1):63–64, 2023.
  9. Learning in repeated auctions with budgets: Regret minimization and equilibrium. Management Science, 65(9):3952–3968, 2019.
  10. The best of many worlds: Dual mirror descent for online allocation problems. Operations Research, 2022.
  11. No-regret is not enough! bandits with general constraints through adaptive regret minimization. arXiv preprint, abs/2405.06575, 2024a.
  12. No-regret learning in bilateral trade via global budget balance. In STOC. ACM, 2024b.
  13. Bandits with replenishable knapsacks: the best of both worlds. In International Conference on Learning Representations (ICLR), 2024c.
  14. The best of both worlds: Stochastic and adversarial bandits. In COLT, volume 23 of JMLR Proceedings, pages 42.1–42.23. JMLR.org, 2012.
  15. Online learning with knapsacks: the best of both worlds. In International Conference on Machine Learning, pages 2767–2783. PMLR, 2022a.
  16. A unifying framework for online optimization with long-term constraints. In Advances in Neural Information Processing Systems, volume 35, pages 33589–33602, 2022b.
  17. Online learning under budget and roi constraints via weak adaptivity. arXiv preprint arXiv:2302.01203, 2024.
  18. Approximately stationary bandits with knapsacks. In Proceedings of Thirty Sixth Conference on Learning Theory, volume 195, pages 3758–3782, 12–15 Jul 2023.
  19. Elad Hazan et al. Introduction to online convex optimization, volume 2. Now Publishers, Inc., 2016.
  20. Adversarial bandits with knapsacks. In 60th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2019, pages 202–219. IEEE Computer Society, 2019.
  21. Adversarial bandits with knapsacks. J. ACM, 69(6), 2022. ISSN 0004-5411.
  22. Online learning with vector costs and bandits with knapsacks. In Conference on Learning Theory, pages 2286–2305. PMLR, 2020.
  23. Non-monotonic resource utilization in the bandits with knapsacks problem. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
  24. Non-stationary bandits with knapsacks. Advances in Neural Information Processing Systems, 35:16522–16532, 2022.
  25. Gergely Neu. Explore no more: Improved high-probability regret bounds for non-stochastic bandits. Advances in Neural Information Processing Systems, 28, 2015.
  26. An improved parametrization and analysis of the EXP3++ algorithm for stochastic and adversarial bandits. In COLT, volume 65 of Proceedings of Machine Learning Research, pages 1743–1759. PMLR, 2017.
  27. One practical algorithm for both stochastic and adversarial bandits. In ICML, volume 32 of JMLR Workshop and Conference Proceedings, pages 1287–1295. JMLR.org, 2014.
  28. Contextual bandits with packing and covering constraints: A modular lagrangian approach via regression. In The Thirty Sixth Annual Conference on Learning Theory, pages 4633–4656. PMLR, 2023.
  29. More adaptive algorithms for adversarial bandits. In COLT, volume 75 of Proceedings of Machine Learning Research, pages 1263–1291. PMLR, 2018.
  30. Tsallis-inf: An optimal algorithm for stochastic and adversarial bandits. J. Mach. Learn. Res., 22:28:1–28:49, 2021.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets