Best-of-Both-Worlds Linear Contextual Bandits (2312.16489v1)
Abstract: This study investigates the problem of $K$-armed linear contextual bandits, an instance of the multi-armed bandit problem, under an adversarial corruption. At each round, a decision-maker observes an independent and identically distributed context and then selects an arm based on the context and past observations. After selecting an arm, the decision-maker incurs a loss corresponding to the selected arm. The decision-maker aims to minimize the cumulative loss over the trial. The goal of this study is to develop a strategy that is effective in both stochastic and adversarial environments, with theoretical guarantees. We first formulate the problem by introducing a novel setting of bandits with adversarial corruption, referred to as the contextual adversarial regime with a self-bounding constraint. We assume linear models for the relationship between the loss and the context. Then, we propose a strategy that extends the RealLinExp3 by Neu & Olkhovskaya (2020) and the Follow-The-Regularized-Leader (FTRL). The regret of our proposed algorithm is shown to be upper-bounded by $O\left(\min\left{\frac{(\log(T))3}{\Delta_{*}} + \sqrt{\frac{C(\log(T))3}{\Delta_{*}}},\ \ \sqrt{T}(\log(T))2\right}\right)$, where $T \in\mathbb{N}$ is the number of rounds, $\Delta_{} > 0$ is the constant minimum gap between the best and suboptimal arms for any context, and $C\in[0, T] $ is an adversarial corruption parameter. This regret upper bound implies $O\left(\frac{(\log(T))3}{\Delta_{}}\right)$ in a stochastic environment and by $O\left( \sqrt{T}(\log(T))2\right)$ in an adversarial environment. We refer to our strategy as the Best-of-Both-Worlds (BoBW) RealFTRL, due to its theoretical guarantees in both stochastic and adversarial regimes.
- Improved algorithms for linear stochastic bandits. In Advances in Neural Information Processing Systems (NeurIPS), 2011.
- Associative reinforcement learning using linear probabilistic concepts. In International Conference on Machine Learning (ICML), 1999.
- An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits. In Conference on Learning Theory (COLT), 2016.
- Online decision making with high-dimensional covariates. Operations Research, 68(1), 2020.
- Contextual bandit algorithms with supervised learning guarantees. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2011.
- The best of both worlds: Stochastic and adversarial bandits. In Conference on Learning Theory (COLT), 2012.
- Efficient and robust high-dimensional linear contextual bandits. In International Joint Conference on Artificial Intelligence (IJCAI), 2020.
- Contextual bandits with linear payoff functions. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2011.
- Stochastic linear optimization under bandit feedback. In Annual Conference Computational Learning Theory (COLT), 2008.
- Best of both worlds policy optimization. In International Conference on Machine Learning (ICML), 2023.
- Robust stochastic linear contextual bandits under adversarial attacks. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2022.
- A linear response bandit problem. Stochastic Systems, 2013.
- Better algorithms for stochastic bandits with adversarial corruptions. In Conference on Learning Theory (COLT), 2019.
- Online learning with low rank experts. In Conference on Learning Theory (COLT), 2016.
- Nearly optimal algorithms for linear contextual bandits with adversarial corruptions. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
- Nearly optimal best-of-both-worlds algorithms for online learning with feedback graphs. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
- Improved best-of-both-worlds guarantees for multi-armed bandits: Ftrl with general regularizers and multiple optimal arms. In Advances in Neural Information Processing Systems (NeurIPS), 2023.
- Learning hurdles for sleeping experts. ACM Transactions on Computation Theory, 6(3), 2014.
- Best-of-three-worlds analysis for linear bandits with follow-the-regularized-leader algorithm. In Conference on Learning Theory (COLT), 2023.
- The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2017.
- Achieving near instance-optimality and minimax-optimality in stochastic and adversarial linear bandits simultaneously. In International Conference on Machine Learning (ICML), 2021.
- Regret lower bound and optimal algorithm for high-dimensional contextual linear bandit. Electronic Journal of Statistics, 15(2):5652 – 5695, 2021.
- A contextual-bandit approach to personalized news article recommendation. In International Conference on World Wide Web (WWW), 2010.
- Bypassing the simulator: Near-optimal adversarial linear contextual bandits. In Advances in Neural Information Processing Systems (NeurIPS), 2023.
- Competitive caching with machine learned advice. In International Conference on Machine Learning (ICML), 2018.
- Efficient and robust algorithms for adversarial linear contextual bandits. In Conference on Learning Theory (COLT), 2020.
- Bistro: An efficient relaxation-based method for contextual bandits. In International Conference on Machine Learning (ICML), 2016.
- Linearly parameterized bandits. Mathematics of Operations Research, 35(2):395–411, 2010.
- An improved parametrization and analysis of the EXP3++ algorithm for stochastic and adversarial bandits. In Conference on Learning Theory (COLT), 2017.
- One practical algorithm for both stochastic and adversarial bandits. In International Conference on Machine Learning (ICML), 2014.
- Improved regret bounds for oracle-based adversarial contextual bandits. In Advances in Neural Information Processing Systems (NeurIPS), 2016.
- From ads to interventions: Contextual bandits in mobile health. In Mobile Health: Sensors, Analytic Methods, and Applications, pp. 495–517, 2017.
- Best-of-both-worlds algorithms for partial monitoring. In International Conference on Algorithmic Learning Theory (ALT), 2023a.
- Stability-penalty-adaptive follow-the-regularized-leader: Sparsity, game-dependency, and best-of-both-worlds. In Advances in Neural Information Processing Systems (NeurIPS), 2023b.
- Minimax concave penalized multi-armed bandit model with high-dimensional covariates. In International Conference on Machine Learning (ICML), 2018.
- More adaptive algorithms for adversarial bandits. In Conference on Learning Theory (COLT), 2018.
- Linear contextual bandits with adversarial corruptions, 2021. URL https://openreview.net/forum?id=Wz-t1oOTWa.
- Tsallis-inf: An optimal algorithm for stochastic and adversarial bandits. Journal of Machine Learning Research, 22(1), 2021.
- Masahiro Kato (49 papers)
- Shinji Ito (31 papers)