The Real Price of Bandit Information in Multiclass Classification (2405.10027v2)
Abstract: We revisit the classical problem of multiclass classification with bandit feedback (Kakade, Shalev-Shwartz and Tewari, 2008), where each input classifies to one of $K$ possible labels and feedback is restricted to whether the predicted label is correct or not. Our primary inquiry is with regard to the dependency on the number of labels $K$, and whether $T$-step regret bounds in this setting can be improved beyond the $\smash{\sqrt{KT}}$ dependence exhibited by existing algorithms. Our main contribution is in showing that the minimax regret of bandit multiclass is in fact more nuanced, and is of the form $\smash{\widetilde{\Theta}\left(\min \left{|H| + \sqrt{T}, \sqrt{KT \log |H|} \right} \right) }$, where $H$ is the underlying (finite) hypothesis class. In particular, we present a new bandit classification algorithm that guarantees regret $\smash{\widetilde{O}(|H|+\sqrt{T})}$, improving over classical algorithms for moderately-sized hypothesis classes, and give a matching lower bound establishing tightness of the upper bounds (up to log-factors) in all parameter regimes.
- Taming the monster: A fast and simple algorithm for contextual bandits. In International Conference on Machine Learning, pages 1638–1646. PMLR, 2014.
- Uncoupled learning dynamics with o(logt)𝑜𝑡o(\log t)italic_o ( roman_log italic_t ) swap regret in multiplayer games. Advances in Neural Information Processing Systems, 35:3292–3304, 2022.
- Near-optimal ϕitalic-ϕ\phiitalic_ϕ -regret learning in extensive-form games. In International Conference on Machine Learning, pages 814–839. PMLR, 2023.
- Exploration–exploitation tradeoff using variance estimates in multi-armed bandits. Theoretical Computer Science, 410(19):1876–1902, 2009.
- P. Auer and P. M. Long. Structural results about on-line learning models with and without queries. Mach. Learn., 36(3):147–181, 1999. doi: 10.1023/A:1007614417594.
- The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1):48–77, 2002.
- Contextual bandit algorithms with supervised learning guarantees. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pages 19–26. JMLR Workshop and Conference Proceedings, 2011.
- Learnability and the vapnik-chervonenkis dimension. Journal of the ACM (JACM), 36(4):929–965, 1989.
- Sparsity, variance and curvature in multi-armed bandits. In Algorithmic Learning Theory, pages 111–127. PMLR, 2018.
- Contextual bandits with linear payoff functions. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pages 208–214. JMLR Workshop and Conference Proceedings, 2011.
- Network information theory. Elements of information theory, pages 374–458, 1991.
- A. Daniely and T. Helbertal. The price of bandit information in multiclass online classification. In Conference on Learning Theory, pages 93–104. PMLR, 2013.
- Multiclass learnability and the erm principle. In Proceedings of the 24th Annual Conference on Learning Theory, pages 207–232. JMLR Workshop and Conference Proceedings, 2011.
- Efficient optimal learning for contextual bandits. arXiv preprint arXiv:1106.2369, 2011.
- L. Erez and T. Koren. Best-of-all-worlds bounds for online learning with feedback graphs. In NeurIPS, pages 28511–28521, 2021.
- Parametric bandits: The generalized linear case. Advances in neural information processing systems, 23, 2010.
- D. Foster and A. Rakhlin. Beyond ucb: Optimal and efficient contextual bandits with regression oracles. In International Conference on Machine Learning, pages 3199–3210. PMLR, 2020.
- Practical contextual bandits with regression oracles. In International Conference on Machine Learning, pages 1539–1548. PMLR, 2018.
- Adapting to misspecification in contextual bandits. Advances in Neural Information Processing Systems, 33:11478–11489, 2020a.
- Instance-dependent complexity of contextual bandits and reinforcement learning: A disagreement-based perspective. arXiv preprint arXiv:2010.03104, 2020b.
- J. Geneson. A note on the price of bandit feedback for mistake-bounded online learning. Theoretical Computer Science, 874:42–45, 2021.
- S. Hanneke and L. Yang. Minimax analysis of active learning. J. Mach. Learn. Res., 16(1):3487–3602, 2015.
- E. Hazan et al. Introduction to online convex optimization. Foundations and Trends® in Optimization, 2(3-4):157–325, 2016.
- T. Jin and H. Luo. Simultaneously learning stochastic and adversarial episodic mdps with known transition. Advances in Neural Information Processing Systems, 33, 2020.
- The best of both worlds: stochastic and adversarial episodic mdps with unknown transition. Advances in Neural Information Processing Systems, 34:20491–20502, 2021.
- Efficient bandit algorithms for online multiclass prediction. In Proceedings of the 25th international conference on Machine learning, pages 440–447, 2008.
- J. Kwon and V. Perchet. Gains and losses are fundamentally different in regret minimization: The sparse case. The Journal of Machine Learning Research, 17(1):8106–8137, 2016.
- J. Langford and T. Zhang. The epoch-greedy algorithm for contextual multi-armed bandits. Advances in neural information processing systems, 20(1):96–1, 2007.
- T. Lattimore and C. Szepesvari. Bandit Algorithms. Cambridge University Press, 2020.
- Provably optimal algorithms for generalized linear contextual bandits. In International Conference on Machine Learning, pages 2071–2080. PMLR, 2017.
- P. M. Long. New bounds on the price of bandit feedback for mistake-bounded online multiclass learning. Theor. Comput. Sci., 808:159–163, 2020. doi: 10.1016/J.TCS.2019.11.017.
- H. B. McMahan and M. Streeter. Tighter bounds for multi-armed bandits with expert advice. In COLT, 2009.
- F. Orabona. A modern introduction to online learning. arXiv preprint arXiv:1912.13213, 2019.
- Multiclass online learnability under bandit feedback. arXiv preprint arXiv:2308.04620, 2023.
- A. Slivkins. Introduction to multi-armed bandits. SIGecom Exch., 18(1):28–30, 2020.
- C.-Y. Wei and H. Luo. More adaptive algorithms for adversarial bandits. In Conference On Learning Theory, pages 1263–1291. PMLR, 2018.