Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adaptive Regret for Bandits Made Possible: Two Queries Suffice (2401.09278v1)

Published 17 Jan 2024 in cs.LG

Abstract: Fast changing states or volatile environments pose a significant challenge to online optimization, which needs to perform rapid adaptation under limited observation. In this paper, we give query and regret optimal bandit algorithms under the strict notion of strongly adaptive regret, which measures the maximum regret over any contiguous interval $I$. Due to its worst-case nature, there is an almost-linear $\Omega(|I|{1-\epsilon})$ regret lower bound, when only one query per round is allowed [Daniely el al, ICML 2015]. Surprisingly, with just two queries per round, we give Strongly Adaptive Bandit Learner (StABL) that achieves $\tilde{O}(\sqrt{n|I|})$ adaptive regret for multi-armed bandits with $n$ arms. The bound is tight and cannot be improved in general. Our algorithm leverages a multiplicative update scheme of varying stepsizes and a carefully chosen observation distribution to control the variance. Furthermore, we extend our results and provide optimal algorithms in the bandit convex optimization setting. Finally, we empirically demonstrate the superior performance of our algorithms under volatile environments and for downstream tasks, such as algorithm selection for hyperparameter optimization.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. Optimal algorithms for online convex optimization with multi-point bandit feedback. In Conference on Learning Theory (COLT), 2010.
  2. The non-stochastic multi-armed bandit problem. SIAM Journal on Computing, 32(1):48–77, 2002.
  3. Optimal dynamic regret in exp-concave online learning. In Conference on Learning Theory (COLT), 2021.
  4. Optimal dynamic regret in proper online learning with strongly convex losses and beyond. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2022.
  5. Online learning and bandits with queried hints. In 14th Innovations in Theoretical Computer Science Conference (ITCS), 2023.
  6. Tracking a small set of experts by mixing past posteriors. Journal of Machine Learning Research, 3:363–396, 2002.
  7. Kernel-based methods for bandit convex optimization. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing (STOC), 2017.
  8. Strongly adaptive online learning. In International Conference on Machine Learning (ICML), 2015.
  9. Benchmarking optimization software with performance profiles. Mathematical programming, 91:201–213, 2002.
  10. Online convex optimization in the bandit setting: gradient descent without a gradient. In Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2005.
  11. Elad Hazan. Introduction to online convex optimization. Foundations and Trends® in Optimization, 2(3-4):157–325, 2016.
  12. Efficient learning algorithms for changing environments. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML), 2009.
  13. Tracking the best expert. Machine learning, 32(2):151–178, 1998.
  14. Improved strongly adaptive online learning using coin betting. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2017.
  15. On the computational efficiency of adaptive and dynamic regret minimization. arXiv preprint, 2023. URL https://arxiv.org/abs/2207.00646.
  16. Adaptive gradient methods with local guarantees. arXiv preprint arXiv:2203.01400, 2022.
  17. Coin betting and parameter-free online learning. Advances in Neural Information Processing Systems (NIPS), 2016.
  18. Coco: the bi-objective black box optimization benchmarking (bbob-biobj) test suite. ArXiv e-prints, 2016.
  19. Firefly algorithm: recent advances and applications. International journal of swarm intelligence, 1(1):36–50, 2013.
  20. Improved dynamic regret for non-degenerate functions. Advances in Neural Information Processing Systems (NIPS), 2017.
  21. Dynamic regret of strongly adaptive methods. In International Conference on Machine Learning (ICML), 2018.
  22. Minimizing dynamic regret and adaptive regret simultaneously. In International Conference on Artificial Intelligence and Statistics, pp.  309–319. PMLR, 2020.
  23. Dynamic regret of convex and smooth functions. Advances in Neural Information Processing Systems (NeurIPS), 2020.
  24. Bandit convex optimization in non-stationary environments. The Journal of Machine Learning Research, 22(1):5562–5606, 2021.
  25. Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the Twentieth International Conference on Machine Learning (ICML), 2003.

Summary

We haven't generated a summary for this paper yet.