LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits (2403.03219v2)
Abstract: This study considers the linear contextual bandit problem with independent and identically distributed (i.i.d.) contexts. In this problem, existing studies have proposed Best-of-Both-Worlds (BoBW) algorithms whose regrets satisfy $O(\log2(T))$ for the number of rounds $T$ in a stochastic regime with a suboptimality gap lower-bounded by a positive constant, while satisfying $O(\sqrt{T})$ in an adversarial regime. However, the dependency on $T$ has room for improvement, and the suboptimality-gap assumption can be relaxed. For this issue, this study proposes an algorithm whose regret satisfies $O(\log(T))$ in the setting when the suboptimality gap is lower-bounded. Furthermore, we introduce a margin condition, a milder assumption on the suboptimality gap. That condition characterizes the problem difficulty linked to the suboptimality gap using a parameter $\beta \in (0, \infty]$. We then show that the algorithm's regret satisfies $O\left(\left{\log(T)\right}{\frac{1+\beta}{2+\beta}}T{\frac{1}{2+\beta}}\right)$. Here, $\beta= \infty$ corresponds to the case in the existing studies where a lower bound exists in the suboptimality gap, and our regret satisfies $O(\log(T))$ in that case. Our proposed algorithm is based on the Follow-The-Regularized-Leader with the Tsallis entropy and referred to as the $\alpha$-Linear-Contextual (LC)-Tsallis-INF.
- Improved algorithms for linear stochastic bandits. In Advances in Neural Information Processing Systems (NeurIPS), 2011.
- Associative reinforcement learning using linear probabilistic concepts. In International Conference on Machine Learning (ICML), 1999.
- An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits. In Conference on Learning Theory (COLT), 2016.
- Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2):235–256, 2002.
- Online decision making with high-dimensional covariates. Operations Research, 68(1), 2020.
- Contextual bandit algorithms with supervised learning guarantees. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2011.
- The best of both worlds: Stochastic and adversarial bandits. In Conference on Learning Theory (COLT), 2012.
- Efficient and robust high-dimensional linear contextual bandits. In International Joint Conference on Artificial Intelligence (IJCAI), 2020.
- Contextual bandits with linear payoff functions. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2011.
- Stochastic linear optimization under bandit feedback. In Annual Conference Computational Learning Theory (COLT), 2008.
- A blackbox approach to best of both worlds in bandits and beyond. In Neu, G. and Rosasco, L. (eds.), Conference on Learning Theory (COLT), volume 195, pp. 5503–5570. PMLR, 2023a.
- Best of both worlds policy optimization. In International Conference on Machine Learning (ICML), 2023b.
- Robust stochastic linear contextual bandits under adversarial attacks. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2022.
- A linear response bandit problem. Stochastic Systems, 2013.
- Better algorithms for stochastic bandits with adversarial corruptions. In Conference on Learning Theory (COLT), 2019.
- Online learning with low rank experts. In Conference on Learning Theory (COLT), 2016.
- Nearly optimal algorithms for linear contextual bandits with adversarial corruptions. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
- Nearly optimal best-of-both-worlds algorithms for online learning with feedback graphs. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
- Improved best-of-both-worlds guarantees for multi-armed bandits: Ftrl with general regularizers and multiple optimal arms. In Advances in Neural Information Processing Systems (NeurIPS), 2023.
- Learning hurdles for sleeping experts. ACM Transactions on Computation Theory, 6(3), 2014.
- Best-of-both-worlds linear contextual bandits, 2023. arXIv:2312.16489.
- Best-of-three-worlds analysis for linear bandits with follow-the-regularized-leader algorithm. In Conference on Learning Theory (COLT), 2023.
- Best-of-both-worlds algorithms for linear contextual bandits, 2023.
- Bandit Algorithms. Cambridge University Press, 2020.
- Achieving near instance-optimality and minimax-optimality in stochastic and adversarial linear bandits simultaneously. In International Conference on Machine Learning (ICML), 2021.
- Regret lower bound and optimal algorithm for high-dimensional contextual linear bandit. Electronic Journal of Statistics, 15(2):5652 – 5695, 2021.
- A contextual-bandit approach to personalized news article recommendation. In International Conference on World Wide Web (WWW), 2010.
- Bypassing the simulator: Near-optimal adversarial linear contextual bandits. In Advances in Neural Information Processing Systems (NeurIPS), 2023.
- Policy optimization in adversarial mdps: Improved exploration via dilated bonuses. In Advances in Neural Information Processing Systems (NeurIPS), volume 34, pp. 22931–22942, 2021.
- Competitive caching with machine learned advice. In International Conference on Machine Learning (ICML), 2018.
- Improved analysis of the tsallis-inf algorithm in stochastically constrained adversarial bandits and stochastic bandits with adversarial corruptions. In Conference on Learning Theory (COLT), 2021.
- Efficient and robust algorithms for adversarial linear contextual bandits. In Conference on Learning Theory (COLT), 2020.
- Bistro: An efficient relaxation-based method for contextual bandits. In International Conference on Machine Learning (ICML), 2016.
- Tsallis-inf for decoupled exploration and exploitation in multi-armed bandits. In Conference on Learning Theory (COLT), volume 125, pp. 3227–3249. PMLR, 2020.
- An improved parametrization and analysis of the EXP3++ algorithm for stochastic and adversarial bandits. In Conference on Learning Theory (COLT), 2017.
- One practical algorithm for both stochastic and adversarial bandits. In International Conference on Machine Learning (ICML), 2014.
- Improved regret bounds for oracle-based adversarial contextual bandits. In Advances in Neural Information Processing Systems (NeurIPS), 2016.
- From ads to interventions: Contextual bandits in mobile health. In Mobile Health: Sensors, Analytic Methods, and Applications, pp. 495–517, 2017.
- Best-of-both-worlds algorithms for partial monitoring. In International Conference on Algorithmic Learning Theory (ALT), 2023a.
- Stability-penalty-adaptive follow-the-regularized-leader: Sparsity, game-dependency, and best-of-both-worlds. In Advances in Neural Information Processing Systems (NeurIPS), 2023b.
- Minimax concave penalized multi-armed bandit model with high-dimensional covariates. In International Conference on Machine Learning (ICML), 2018.
- More adaptive algorithms for adversarial bandits. In Conference on Learning Theory (COLT), 2018.
- Linear contextual bandits with adversarial corruptions, 2021. URL https://openreview.net/forum?id=Wz-t1oOTWa.
- Tsallis-inf: An optimal algorithm for stochastic and adversarial bandits. Journal of Machine Learning Research, 22(1), 2021.
- Masahiro Kato (49 papers)
- Shinji Ito (31 papers)