Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The SMART approach to instance-optimal online learning (2402.17720v1)

Published 27 Feb 2024 in cs.LG, cs.DS, cs.IT, and math.IT

Abstract: We devise an online learning algorithm -- titled Switching via Monotone Adapted Regret Traces (SMART) -- that adapts to the data and achieves regret that is instance optimal, i.e., simultaneously competitive on every input sequence compared to the performance of the follow-the-leader (FTL) policy and the worst case guarantee of any other input policy. We show that the regret of the SMART policy on any input sequence is within a multiplicative factor $e/(e-1) \approx 1.58$ of the smaller of: 1) the regret obtained by FTL on the sequence, and 2) the upper bound on regret guaranteed by the given worst-case policy. This implies a strictly stronger guarantee than typical best-of-both-worlds' bounds as the guarantee holds for every input sequence regardless of how it is generated. SMART is simple to implement as it begins by playing FTL and switches at most once during the time horizon to the worst-case algorithm. Our approach and results follow from an operational reduction of instance optimal online learning to competitive analysis for the ski-rental problem. We complement our competitive ratio upper bounds with a fundamental lower bound showing that over all input sequences, no algorithm can get better than a $1.43$-fraction of the minimum regret achieved by FTL and the minimax-optimal policy. We also present a modification of SMART that combines FTL with a`small-loss" algorithm to achieve instance optimality between the regret of FTL and the small loss regret bound.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Optimal aggregation algorithms for middleware. In Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 102–113, 2001.
  2. Tim Roughgarden. Beyond the worst-case analysis of algorithms. Cambridge University Press, 2021.
  3. D Blackwell. An analog of the minimax theorem for vector payoffs. Pacific Journal of Mathematics, 6(1):1–8, 1956.
  4. James Hannan. Approximation to bayes risk in repeated play. Contributions to the Theory of Games, 3:97–139, 1957.
  5. Prediction, learning, and games. Cambridge university press, 2006.
  6. Aleksandrs Slivkins. Introduction to multi-armed bandits. Foundations and Trends® in Machine Learning, 12(1-2):1–286, 2019.
  7. Thomas M. Cover. Behavior of sequential predictors of binary sequences. In Transactions of the Fourth Prague Conference on Information Theory, 1966.
  8. Following the leader and fast rates in linear prediction: Curved constraint sets and other regularities. Advances in Neural Information Processing Systems, 29, 2016.
  9. Universal prediction of individual sequences. IEEE transactions on Information Theory, 38(4):1258–1270, 1992.
  10. Corralling a band of bandit algorithms. In Conference on Learning Theory, pages 12–38. PMLR, 2017.
  11. Model selection in contextual stochastic bandit problems. Advances in Neural Information Processing Systems, 33:10328–10337, 2020.
  12. Best of both worlds policy optimization. arXiv preprint arXiv:2302.09408, 2023.
  13. The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1):48–77, 2002a.
  14. Follow the leader if you can, hedge if you must. The Journal of Machine Learning Research, 15(1):1281–1316, 2014.
  15. Scale-free algorithms for online linear optimization. In International Conference on Algorithmic Learning Theory, pages 287–301. Springer, 2015.
  16. On the optimality of the hedge algorithm in the stochastic regime. Journal of Machine Learning Research, 20:1–28, 2019.
  17. Relaxing the iid assumption: Adaptively minimax optimal regret via root-entropic regularization. The Annals of Statistics, 51(4):1850–1876, 2023.
  18. The best of both worlds: Stochastic and adversarial bandits. In Conference on Learning Theory, pages 42–1. JMLR Workshop and Conference Proceedings, 2012.
  19. An optimal algorithm for stochastic and adversarial bandits. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 467–475. PMLR, 2019.
  20. Stochastic bandits robust to adversarial corruptions. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 114–122, 2018.
  21. Wojciech Kotłowski. On minimaxity of follow the leader strategy in the stochastic setting. Theoretical Computer Science, 742:50–65, 2018.
  22. Competitive randomized algorithms for nonuniform problems. Algorithmica, 11(6):542–571, 1994.
  23. Online computation and competitive analysis. cambridge university press, 2005.
  24. How to use expert advice. Journal of the ACM (JACM), 44(3):427–485, 1997.
  25. More adaptive algorithms for adversarial bandits. In Conference On Learning Theory, pages 1263–1291. PMLR, 2018.
  26. Improved path-length regret bounds for bandits. In Conference On Learning Theory, pages 508–528. PMLR, 2019.
  27. Best of both worlds: Stochastic and adversarial convex function chasing. arXiv preprint arXiv:2311.00181, 2023.
  28. Regret-optimal controller for the full-information problem. In 2021 American Control Conference (ACC), pages 4777–4782. IEEE, 2021.
  29. Best of both worlds in online control: Competitive ratio and policy regret. In Learning for Dynamics and Control Conference, pages 1345–1356. PMLR, 2023.
  30. Online learning: Stochastic, constrained, and smoothed adversaries. Advances in neural information processing systems, 24, 2011.
  31. Smoothed analysis with adaptive adversaries. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 942–953. IEEE, 2022.
  32. Smoothed online learning is as easy as statistical learning. In Conference on Learning Theory, pages 1716–1786. PMLR, 2022.
  33. Smoothed analysis of sequential probability assignment. Neural Information Processing Systems, 2023.
  34. Prediction with corrupted expert advice. Advances in Neural Information Processing Systems, 33:14315–14325, 2020.
  35. Adaptive and self-confident on-line learning algorithms. Journal of Computer and System Sciences, 64(1):48–75, 2002b.
  36. Minimizing regret with label efficient prediction. IEEE Transactions on Information Theory, 51(6):2152–2162, 2005.
  37. Extracting certainty from uncertainty: Regret bounded by variation in costs. Machine learning, 80:165–188, 2010.
  38. Learning the learning rate for prediction with expert advice. Advances in neural information processing systems, 27, 2014.
  39. Fast rates in statistical and online learning. 2015.
  40. Metagrad: Multiple learning rates in online learning. Advances in Neural Information Processing Systems, 29, 2016.
  41. A second-order bound with excess losses. In Conference on Learning Theory, pages 176–196. PMLR, 2014.
  42. The primal-dual method for learning augmented algorithms. Advances in Neural Information Processing Systems, 33:20083–20094, 2020.
  43. Algorithms with prediction portfolios. Advances in neural information processing systems, 35:20273–20286, 2022.
  44. Online algorithms with multiple predictions. In International Conference on Machine Learning, pages 582–598. PMLR, 2022.
  45. Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 71(3):291–307, 2005.
  46. Improved second-order bounds for prediction with expert advice. Machine Learning, 66:321–352, 2007.
  47. William Feller. An introduction to probability theory and its applications, Volume 1, Third Edition. John Wiley & Sons, New York.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Siddhartha Banerjee (59 papers)
  2. Alankrita Bhatt (10 papers)
  3. Christina Lee Yu (30 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com