Papers
Topics
Authors
Recent
2000 character limit reached

Stealthy Adversarial Attacks on Stochastic Multi-Armed Bandits (2402.13487v1)

Published 21 Feb 2024 in cs.LG and cs.CR

Abstract: Adversarial attacks against stochastic multi-armed bandit (MAB) algorithms have been extensively studied in the literature. In this work, we focus on reward poisoning attacks and find most existing attacks can be easily detected by our proposed detection method based on the test of homogeneity, due to their aggressive nature in reward manipulations. This motivates us to study the notion of stealthy attack against stochastic MABs and investigate the resulting attackability. Our analysis shows that against two popularly employed MAB algorithms, UCB1 and $\epsilon$-greedy, the success of a stealthy attack depends on the environmental conditions and the realized reward of the arm pulled in the first round. We also analyze the situation for general MAB algorithms equipped with our attack detection method and find that it is possible to have a stealthy attack that almost always succeeds. This brings new insights into the security risks of MAB algorithms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. Near-optimal regret bounds for thompson sampling. Journal of the ACM (JACM), 64(5):30.
  2. Auer, P. (2002). Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3:397–422.
  3. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2-3):235–256.
  4. Adversarial attacks on combinatorial multi-armed bandits. arXiv preprint arXiv:2310.05308.
  5. Vulnerability of deep reinforcement learning to policy induction attacks. In International Conference on Machine Learning and Data Mining in Pattern Recognition, pages 262–275. Springer.
  6. Stochastic linear bandits robust to adversarial attacks. In International Conference on Artificial Intelligence and Statistics, pages 991–999. PMLR.
  7. Buishand, T. A. (1982). Some methods for testing the homogeneity of rainfall records. Journal of hydrology, 58(1-2):11–27.
  8. Robust stochastic linear contextual bandits under adversarial attacks. In International Conference on Artificial Intelligence and Statistics, pages 7111–7123. PMLR.
  9. The intrinsic robustness of stochastic bandits to strategic manipulation. In International Conference on Machine Learning, pages 3092–3101. PMLR.
  10. Adversarial attacks on linear contextual bandits. Advances in Neural Information Processing Systems, 33.
  11. Robust stochastic bandit algorithms under probabilistic unbounded adversarial attack. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 4036–4043.
  12. Better algorithms for stochastic bandits with adversarial corruptions. In Conference on Learning Theory, pages 1562–1578. PMLR.
  13. Deceptive reinforcement learning under adversarial manipulations on cost signals. In International Conference on Decision and Game Theory for Security, pages 217–237. Springer.
  14. Adversarial attacks on stochastic bandits. In Advances in Neural Information Processing Systems, pages 3640–3649.
  15. Bandit algorithms. Cambridge University Press.
  16. Data poisoning attacks on stochastic bandits. In International Conference on Machine Learning, pages 4042–4050.
  17. Action-manipulation attacks against stochastic bandits: Attacks and defense. IEEE Transactions on Signal Processing, 68:5152–5165.
  18. Provably efficient black-box action poisoning attacks against reinforcement learning. Advances in Neural Information Processing Systems, 34:12400–12410.
  19. Stochastic bandits robust to adversarial corruptions. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 114–122.
  20. Data poisoning attacks in contextual bandits. In International Conference on Decision and Game Theory for Security, pages 186–204. Springer.
  21. Policy poisoning in batch reinforcement learning and control. Advances in Neural Information Processing Systems, 32.
  22. Vulnerability-aware poisoning mechanism for online rl with unknown dynamics. arXiv preprint arXiv:2009.00774.
  23. When are linear stochastic bandits attackable? In International Conference on Machine Learning, pages 23254–23273. PMLR.
  24. Adversarial attacks on online learning to rank with stochastic click models. arXiv preprint arXiv:2305.19218.
  25. Adaptive reward-poisoning attacks against reinforcement learning. In International Conference on Machine Learning, pages 11225–11234. PMLR.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.