Near Optimal Adversarial Attacks on Stochastic Bandits and Defenses with Smoothed Responses (2008.09312v8)
Abstract: I study adversarial attacks against stochastic bandit algorithms. At each round, the learner chooses an arm, and a stochastic reward is generated. The adversary strategically adds corruption to the reward, and the learner is only able to observe the corrupted reward at each round. Two sets of results are presented in this paper. The first set studies the optimal attack strategies for the adversary. The adversary has a target arm he wishes to promote, and his goal is to manipulate the learner into choosing this target arm $T - o(T)$ times. I design attack strategies against UCB and Thompson Sampling that only spend $\widehat{O}(\sqrt{\log T})$ cost. Matching lower bounds are presented, and the vulnerability of UCB, Thompson sampling, and $\varepsilon$-greedy are exactly characterized. The second set studies how the learner can defend against the adversary. Inspired by literature on smoothed analysis and behavioral economics, I present two simple algorithms that achieve a competitive ratio arbitrarily close to 1.
- Handbook of mathematical functions with formulas, graphs, and mathematical tables.
- Taming the monster: A fast and simple algorithm for contextual bandits. In International Conference on Machine Learning, pages 1638–1646. PMLR.
- Analysis of thompson sampling for the multi-armed bandit problem. In Conference on learning theory, pages 39–1. JMLR Workshop and Conference Proceedings.
- Near-optimal regret bounds for thompson sampling. Journal of the ACM (JACM), 64(5):1–24.
- The multiplicative weights update method: a meta-algorithm and applications. Theory of computing, 8(1):121–164.
- Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2):235–256.
- Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends® in Machine Learning, 5(1):1–122.
- Camerer, C. F. (2011). Behavioral game theory: Experiments in strategic interaction. Princeton university press.
- Simple and scalable response prediction for display advertising. ACM Transactions on Intelligent Systems and Technology (TIST), 5(4):1–34.
- On the pairing of the softmax activation and cross-entropy penalty functions and the derivation of the softmax activation function. In Proc. 8th Aust. Conf. on the Neural Networks, Melbourne, volume 181, page 185. Citeseer.
- Adversarial attacks on linear contextual bandits. Advances in Neural Information Processing Systems, 33:14362–14373.
- Better algorithms for stochastic bandits with adversarial corruptions. arXiv preprint arXiv:1902.08647.
- Smoothed analysis of online and differentially private learning. Advances in Neural Information Processing Systems, 33:9203–9215.
- Adversarial attacks on gaussian process bandits. In International Conference on Machine Learning, pages 8304–8329. PMLR.
- Nearly optimal algorithms for linear contextual bandits with adversarial corruptions. Advances in Neural Information Processing Systems, 35:34614–34625.
- Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144.
- Adversarial attacks on stochastic bandits. In Advances in Neural Information Processing Systems, pages 3640–3649.
- A smoothed analysis of the greedy algorithm for the linear contextual bandit problem. Advances in neural information processing systems, 31.
- Algorithms for multi-armed bandit problems. arXiv preprint arXiv:1402.6028.
- Corruption-robust contextual search through density updates. In Conference on Learning Theory, pages 3504–3505. PMLR.
- A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, pages 661–670.
- Data poisoning attacks on stochastic bandits. In International Conference on Machine Learning, pages 4042–4050. PMLR.
- Luce, R. D. (2012). Individual choice behavior: A theoretical analysis. Courier Corporation.
- Stochastic bandits robust to adversarial corruptions. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 114–122.
- Data poisoning attacks in contextual bandits. In Decision and Game Theory for Security: 9th International Conference, GameSec 2018, Seattle, WA, USA, October 29–31, 2018, Proceedings 9, pages 186–204. Springer.
- Adversarial attacks on adversarial bandits. arXiv preprint arXiv:2301.12595.
- McFadden, D. L. (1976). Quantal choice analaysis: A survey. Annals of Economic and Social Measurement, Volume 5, number 4, pages 363–390.
- Quantal response equilibria for normal form games. Games and economic behavior, 10(1):6–38.
- Saving stochastic bandits from poisoning attacks via limited data verification. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 8054–8061.
- Structured linear contextual bandits: A sharp and geometric smoothed analysis. In International Conference on Machine Learning, pages 9026–9035. PMLR.
- Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. Journal of the ACM (JACM), 51(3):385–463.
- A model selection approach for corruption robust reinforcement learning. In International Conference on Algorithmic Learning Theory, pages 1043–1096. PMLR.
- Inverse game theory for stackelberg games: the blessing of bounded rationality. Advances in Neural Information Processing Systems, 35:32186–32198.
- Adaptive reward-poisoning attacks against reinforcement learning. In International Conference on Machine Learning, pages 11225–11234. PMLR.
- Tsallis-inf: An optimal algorithm for stochastic and adversarial bandits. The Journal of Machine Learning Research, 22(1):1310–1358.
- Zuo, S. (2023). Corruption-robust lipschitz contextual search. arXiv preprint arXiv:2307.13903.
- Shiliang Zuo (9 papers)