Geometry-Aware Approaches for Balancing Performance and Theoretical Guarantees in Linear Bandits (2306.14872v3)
Abstract: This paper is motivated by recent research in the $d$-dimensional stochastic linear bandit literature, which has revealed an unsettling discrepancy: algorithms like Thompson sampling and Greedy demonstrate promising empirical performance, yet this contrasts with their pessimistic theoretical regret bounds. The challenge arises from the fact that while these algorithms may perform poorly in certain problem instances, they generally excel in typical instances. To address this, we propose a new data-driven technique that tracks the geometric properties of the uncertainty ellipsoid around the main problem parameter. This methodology enables us to formulate an instance-dependent frequentist regret bound, which incorporates the geometric information, for a broad class of base algorithms, including Greedy, OFUL, and Thompson sampling. This result allows us to identify and ``course-correct" problem instances in which the base algorithms perform poorly. The course-corrected algorithms achieve the minimax optimal regret of order $\tilde{\mathcal{O}}(d\sqrt{T})$ for a $T$-period decision-making scenario, effectively maintaining the desirable attributes of the base algorithms, including their empirical efficacy. We present simulation results to validate our findings using synthetic and real data.
- Abbasi-Yadkori, Yasin. 2013. Online learning for linearly parametrized control problems .
- Improved algorithms for linear stochastic bandits. Advances in Neural Information Processing Systems. 2312–2320.
- Linear thompson sampling revisited. Electronic Journal of Statistics 11(2) 5165–5197.
- Agrawal, Shipra. 2019. Recent Advances in Multiarmed Bandits for Sequential Decision Making, chap. 7. 167–188. 10.1287/educ.2019.0204. URL https://pubsonline.informs.org/doi/abs/10.1287/educ.2019.0204.
- Thompson sampling for contextual bandits with linear payoffs. Sanjoy Dasgupta, David McAllester, eds., Proceedings of the 30th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 28. PMLR, Atlanta, Georgia, USA, 127–135. URL https://proceedings.mlr.press/v28/agrawal13.html.
- Auer, Peter. 2002. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research 3(Nov) 397–422.
- Mostly exploration-free algorithms for contextual bandits. Management Science 67(3) 1329–1349. 10.1287/mnsc.2020.3605.
- A contextual bandit bake-off. Journal of Machine Learning Research 22(133) 1–49. URL http://jmlr.org/papers/v22/18-863.html.
- Stochastic linear optimization under bandit feedback. COLT.
- An information-theoretic analysis for thompson sampling with many actions. Advances in Neural Information Processing Systems. 4157–4165.
- A general theory of the stochastic linear bandit and its applications. arXiv preprint arXiv:2002.05152 .
- On frequentist regret of linear thompson sampling.
- Adaptive exploration in linear contextual bandit.
- A smoothed analysis of the greedy algorithm for the linear contextual bandit problem. Advances in Neural Information Processing Systems. 2227–2236.
- Information directed sampling and bandits with heteroscedastic noise. Proc. International Conference on Learning Theory (COLT).
- Spectral bandits. J. Mach. Learn. Res. 21(1).
- Best arm identification in spectral bandits.
- Spectral Thompson Sampling 28(1). 10.1609/aaai.v28i1.9011. URL https://ojs.aaai.org/index.php/AAAI/article/view/9011.
- Asymptotically efficient adaptive allocation rules. Advances in applied mathematics 6(1) 4–22.
- The end of optimism? an asymptotic analysis of finite-armed linear bandits. Artificial Intelligence and Statistics. PMLR, 728–737.
- Regret Bound Balancing and Elimination for Model Selection in Bandits and RL. arXiv e-prints arXiv:2012.1304510.48550/arXiv.2012.13045.
- The externalities of exploration and how data diversity helps exploitation.
- Linearly parameterized bandits. Mathematics of Operations Research 35(2) 395–411.
- Learning to optimize via posterior sampling. Mathematics of Operations Research 39(4) 1221–1243. 10.1287/moor.2014.0650.
- An information-theoretic analysis of thompson sampling. The Journal of Machine Learning Research 17(1) 2442–2471.
- A tutorial on thompson sampling. Foundations and Trends in Machine Learning 11(1) 1–96. URL http://dx.doi.org/10.1561/2200000070.
- Thompson, William R. 1933. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4) 285–294.
- Spectral bandits for smooth graph functions. Eric P. Xing, Tony Jebara, eds., Proceedings of the 31st International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 32. PMLR, Bejing, China, 46–54. URL https://proceedings.mlr.press/v32/valko14.html.
- Yuwei Luo (6 papers)
- Mohsen Bayati (31 papers)