Papers
Topics
Authors
Recent
2000 character limit reached

Multi-Agent Bandit Learning through Heterogeneous Action Erasure Channels (2312.14259v2)

Published 21 Dec 2023 in cs.LG, cs.DC, and cs.MA

Abstract: Multi-Armed Bandit (MAB) systems are witnessing an upswing in applications within multi-agent distributed environments, leading to the advancement of collaborative MAB algorithms. In such settings, communication between agents executing actions and the primary learner making decisions can hinder the learning process. A prevalent challenge in distributed learning is action erasure, often induced by communication delays and/or channel noise. This results in agents possibly not receiving the intended action from the learner, subsequently leading to misguided feedback. In this paper, we introduce novel algorithms that enable learners to interact concurrently with distributed agents across heterogeneous action erasure channels with different action erasure probabilities. We illustrate that, in contrast to existing bandit algorithms, which experience linear regret, our algorithms assure sub-linear regret guarantees. Our proposed solutions are founded on a meticulously crafted repetition protocol and scheduling of learning across heterogeneous channels. To our knowledge, these are the first algorithms capable of effectively learning through heterogeneous action erasure channels. We substantiate the superior performance of our algorithm through numerical experiments, emphasizing their practical significance in addressing issues related to communication constraints and delays in multi-agent environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. O. A. Hanna, L. Yang, and C. Fragouli, “Solving multi-arm bandit using a few bits of communication,” in International Conference on Artificial Intelligence and Statistics.   PMLR, 2022, pp. 11 215–11 236.
  2. O. Hanna, L. Yang, and C. Fragouli, “Learning from distributed users in contextual linear bandits without sharing the context,” Advances in Neural Information Processing Systems, vol. 35, pp. 11 049–11 062, 2022.
  3. O. A. Hanna, L. F. Yang, and C. Fragouli, “Compression for multi-arm bandits,” IEEE Journal on Selected Areas in Information Theory, 2023.
  4. E. Yeh, T. Ho, Y. Cui, M. Burd, R. Liu, and D. Leong, “Vip: A framework for joint dynamic forwarding and caching in named data networks,” in Proceedings of the 1st ACM Conference on Information-Centric Networking, 2014, pp. 117–126.
  5. M. Dehghan, W. Chu, P. Nain, D. Towsley, and Z.-L. Zhang, “Sharing cache resources among content providers: A utility-based approach,” IEEE/ACM Transactions on Networking, vol. 27, no. 2, pp. 477–490, 2019.
  6. Y. Liu, Z. Zou, O. S. Pak, and A. C. H. Tsang, “Learning to cooperate for low-reynolds-number swimming: a model problem for gait coordination,” Scientific Reports, vol. 13, 2023.
  7. Z. Zou, Y. Liu, Y. N. Young, and et al., “Gait switching and targeted navigation of microswimmers via deep reinforcement learning,” Communications Physics, vol. 5, p. 158, 2022.
  8. I. Amir, I. Attias, T. Koren, Y. Mansour, and R. Livni, “Prediction with corrupted expert advice,” Advances in Neural Information Processing Systems, vol. 33, pp. 14 315–14 325, 2020.
  9. W. R. Thompson, “On the likelihood that one unknown probability exceeds another in view of the evidence of two samples,” Biometrika, vol. 25, pp. 285–294, 1933.
  10. P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire, “The nonstochastic multiarmed bandit problem,” SIAM Journal on Computing, vol. 32, no. 1, pp. 48–77, 2002.
  11. T. L. Lai, “Adaptive treatment allocation and the multi-armed bandit problem,” Annals of Statistics, vol. 15, pp. 1091–1114, 1987.
  12. P. Joulani, A. Gyorgy, and C. Szepesvari, “Online learning under delayed feedback,” in Proceedings of the 30th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, S. Dasgupta and D. McAllester, Eds., vol. 28.   Atlanta, Georgia, USA: PMLR, 6 2013, pp. 1453–1461.
  13. T. Mandel, Y.-E. Liu, E. Brunskill, and Z. Popović, “The queue method: Handling delay, heuristics, prior data, and evaluation in bandits,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29, no. 1, 2 2015.
  14. C. Pike-Burke, S. Agrawal, C. Szepesvari, and S. Grunewalder, “Bandits with delayed, aggregated anonymous feedback,” in International Conference on Machine Learning.   PMLR, 2018, pp. 4105–4113.
  15. C. Vernade, O. Cappé, and V. Perchet, “Stochastic bandit models for delayed conversions,” arXiv preprint arXiv:1706.09186, 2017.
  16. A. Grover, T. Markov, P. Attia, N. Jin, N. Perkins, B. Cheong, M. Chen, Z. Yang, S. Harris, W. Chueh, and S. Ermon, “Best arm identification in multi-armed bandits with delayed feedback,” in Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, ser. Proceedings of Machine Learning Research, A. Storkey and F. Perez-Cruz, Eds., vol. 84.   PMLR, 4 2018, pp. 833–842.
  17. O. A. Hanna, M. Karakas, L. F. Yang, and C. Fragouli, “Multi-arm bandits over action erasure channels,” in 2023 IEEE International Symposium on Information Theory (ISIT).   IEEE, 2023, pp. 1312–1317.
  18. P. Auer and R. Ortner, “Ucb revisited: Improved regret bounds for the stochastic multi-armed bandit problem,” Periodica Mathematica Hungarica, vol. 61, no. 1-2, pp. 55–65, 2010.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.