Robust Thompson Sampling Algorithms Against Reward Poisoning Attacks (2410.19705v1)
Abstract: Thompson sampling is one of the most popular learning algorithms for online sequential decision-making problems and has rich real-world applications. However, current Thompson sampling algorithms are limited by the assumption that the rewards received are uncorrupted, which may not be true in real-world applications where adversarial reward poisoning exists. To make Thompson sampling more reliable, we want to make it robust against adversarial reward poisoning. The main challenge is that one can no longer compute the actual posteriors for the true reward, as the agent can only observe the rewards after corruption. In this work, we solve this problem by computing pseudo-posteriors that are less likely to be manipulated by the attack. We propose robust algorithms based on Thompson sampling for the popular stochastic and contextual linear bandit settings in both cases where the agent is aware or unaware of the budget of the attacker. We theoretically show that our algorithms guarantee near-optimal regret under any attack strategy.
- Improved algorithms for linear stochastic bandits. Advances in neural information processing systems, 24, 2011.
- Analysis of thompson sampling for the multi-armed bandit problem. In Conference on learning theory, pp. 39–1. JMLR Workshop and Conference Proceedings, 2012.
- Thompson sampling for contextual bandits with linear payoffs. In International conference on machine learning, pp. 127–135. PMLR, 2013.
- Near-optimal regret bounds for thompson sampling. Journal of the ACM (JACM), 64(5):1–24, 2017.
- Thompson sampling for the mnl-bandit. In Conference on learning theory, pp. 76–78. PMLR, 2017.
- Stochastic linear bandits robust to adversarial attacks. In International Conference on Artificial Intelligence and Statistics, pp. 991–999. PMLR, 2021.
- Contextual bandit for active learning: Active thompson sampling. In Neural Information Processing: 21st International Conference, ICONIP 2014, Kuching, Malaysia, November 3-6, 2014. Proceedings, Part I 21, pp. 405–412. Springer, 2014.
- Ensemble recommendations via thompson sampling: an experimental study within e-commerce. In 23rd international conference on intelligent user interfaces, pp. 19–29, 2018.
- An empirical evaluation of thompson sampling. Advances in neural information processing systems, 24, 2011.
- Contextual bandits with linear payoff functions. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 208–214. JMLR Workshop and Conference Proceedings, 2011.
- Robust stochastic linear contextual bandits under adversarial attacks. In International Conference on Artificial Intelligence and Statistics, pp. 7111–7123. PMLR, 2022.
- Adversarial attacks on linear contextual bandits. Advances in Neural Information Processing Systems, 33, 2020.
- Better algorithms for stochastic bandits with adversarial corruptions. In Conference on Learning Theory, pp. 1562–1578. PMLR, 2019.
- On frequentist regret of linear thompson sampling. arXiv preprint arXiv:2006.06790, 2020.
- Nearly optimal algorithms for linear contextual bandits with adversarial corruptions. Advances in Neural Information Processing Systems, 35:34614–34625, 2022.
- Nearly minimax optimal reinforcement learning for linear markov decision processes. In International Conference on Machine Learning, pp. 12790–12822. PMLR, 2023.
- Near-optimal thompson sampling-based algorithms for differentially private stochastic bandits. In Uncertainty in Artificial Intelligence, pp. 844–852. PMLR, 2022.
- Optimal algorithms for private online learning in a stochastic environment. arXiv e-prints, pp. arXiv–2102, 2021.
- Adversarial attacks on stochastic bandits. In Advances in Neural Information Processing Systems, pp. 3640–3649, 2018.
- Stochastic multi-armed bandits with unrestricted delay distributions. In International Conference on Machine Learning, pp. 5969–5978. PMLR, 2021.
- Data poisoning attacks on stochastic bandits. In International Conference on Machine Learning, pp. 4042–4050. PMLR, 2019.
- Stochastic bandits robust to adversarial corruptions. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pp. 114–122, 2018.
- (nearly) optimal differentially private stochastic multi-arm bandits. In Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, pp. 592–601, 2015.
- Efficient and robust algorithms for adversarial linear contextual bandits. In Conference on Learning Theory, pp. 3049–3068. PMLR, 2020.
- Learning unknown markov decision processes: A thompson sampling approach. Advances in neural information processing systems, 30, 2017.
- A tutorial on thompson sampling. Foundations and Trends® in Machine Learning, 11(1):1–96, 2018.
- Steven L Scott. A modern bayesian look at the multi-armed bandit. Applied Stochastic Models in Business and Industry, 26(6):639–658, 2010.
- Aleksandrs Slivkins et al. Introduction to multi-armed bandits. Foundations and Trends® in Machine Learning, 12(1-2):1–286, 2019.
- William R Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3-4):285–294, 1933.
- A model selection approach for corruption robust reinforcement learning. In International Conference on Algorithmic Learning Theory, pp. 1043–1096. PMLR, 2022.
- Observation-free attacks on stochastic bandits. Advances in Neural Information Processing Systems, 34:22550–22561, 2021.
- On the robustness of epoch-greedy in multi-agent contextual bandit mechanisms. arXiv preprint arXiv:2307.07675, 2023.
- Robust policy gradient against strong data corruption. In International Conference on Machine Learning, pp. 12391–12401. PMLR, 2021.
- Linear contextual bandits with adversarial corruptions. arXiv preprint arXiv:2110.12615, 2021.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.