Cooperative Multi-Agent Graph Bandits: UCB Algorithm and Regret Analysis (2401.10383v2)
Abstract: In this paper, we formulate the multi-agent graph bandit problem as a multi-agent extension of the graph bandit problem introduced by Zhang, Johansson, and Li [CISS 57, 1-6 (2023)]. In our formulation, $N$ cooperative agents travel on a connected graph $G$ with $K$ nodes. Upon arrival at each node, agents observe a random reward drawn from a node-dependent probability distribution. The reward of the system is modeled as a weighted sum of the rewards the agents observe, where the weights capture some transformation of the reward associated with multiple agents sampling the same node at the same time. We propose an Upper Confidence Bound (UCB)-based learning algorithm, Multi-G-UCB, and prove that its expected regret over $T$ steps is bounded by $O(\gamma N\log(T)[\sqrt{KT} + DK])$, where $D$ is the diameter of graph $G$ and $\gamma$ a boundedness parameter associated with the weight functions. Lastly, we numerically test our algorithm by comparing it to alternative methods.
- T. Zhang, K. Johansson, and N. Li, “Multi-armed bandit learning on a graph,” in 2023 57th Annual Conference on Information Sciences and Systems (CISS). IEEE, 2023, pp. 1–6.
- A. Slivkins et al., “Introduction to multi-armed bandits,” Foundations and Trends® in Machine Learning, vol. 12, no. 1-2, pp. 1–286, 2019.
- S. Bubeck, N. Cesa-Bianchi, et al., “Regret analysis of stochastic and nonstochastic multi-armed bandit problems,” Foundations and Trends® in Machine Learning, vol. 5, no. 1, pp. 1–122, 2012.
- J. Zhu, R. Sandhu, and J. Liu, “A distributed algorithm for sequential decision making in multi-armed bandit with homogeneous rewards,” in 2020 59th IEEE Conference on Decision and Control (CDC). Jeju, Korea (South): IEEE, Dec 2020, p. 3078–3083. [Online]. Available: https://ieeexplore.ieee.org/document/9303836/
- M. Chakraborty, K. Y. P. Chua, S. Das, and B. Juba, “Coordinated versus decentralized exploration in multi-agent multi-armed bandits,” in Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence. Melbourne, Australia: International Joint Conferences on Artificial Intelligence Organization, 2017, p. 164–170. [Online]. Available: https://www.ijcai.org/proceedings/2017/24
- D. Martínez-Rubio, V. Kanade, and P. Rebeschini, “Decentralized cooperative stochastic bandits,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- P.-A. Wang, A. Proutiere, K. Ariu, Y. Jedra, and A. Russo, “Optimal algorithms for multiplayer multi-armed bandits,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2020, pp. 4120–4129.
- P. Landgren, V. Srivastava, and N. E. Leonard, “Distributed cooperative decision-making in multiarmed bandits: Frequentist and bayesian algorithms,” in 2016 IEEE 55th Conference on Decision and Control (CDC). IEEE, 2016, pp. 167–172.
- M. Agarwal, V. Aggarwal, and K. Azizzadenesheli, “Multi-agent multi-armed bandits with limited communication,” The Journal of Machine Learning Research, vol. 23, no. 1, pp. 9529–9552, 2022.
- A. Sankararaman, A. Ganesh, and S. Shakkottai, “Social learning in multi agent multi armed bandits,” Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 3, no. 3, pp. 1–35, 2019.
- R. Chawla, A. Sankararaman, A. Ganesh, and S. Shakkottai, “The gossiping insert-eliminate algorithm for multi-agent bandits,” in International conference on artificial intelligence and statistics. PMLR, 2020, pp. 3471–3481.
- D. Kalathil, N. Nayyar, and R. Jain, “Decentralized learning for multi-player multi-armed bandits,” in 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), Dec 2012, p. 3960–3965, arXiv:1206.3582 [cs, math]. [Online]. Available: http://arxiv.org/abs/1206.3582
- K. Liu and Q. Zhao, “Distributed learning in multi-armed bandit with multiple players,” IEEE Transactions on Signal Processing, vol. 58, no. 11, p. 5667–5681, Nov 2010.
- P.-A. Wang, A. Proutiere, K. Ariu, Y. Jedra, and A. Russo, “Optimal algorithms for multiplayer multi-armed bandits,” in Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics. PMLR, Jun 2020, p. 4120–4129. [Online]. Available: https://proceedings.mlr.press/v108/wang20m.html
- E. Boursier and V. Perchet, “Sic-mmab: Synchronisation involves communication in multiplayer multi-armed bandits,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- W. Chen, Y. Wang, and Y. Yuan, “Combinatorial multi-armed bandit: General framework and applications,” in Proceedings of the 30th International Conference on Machine Learning. PMLR, Feb 2013, p. 151–159. [Online]. Available: https://proceedings.mlr.press/v28/chen13a.html
- W. Chen, Y. Wang, Y. Yuan, and Q. Wang, “Combinatorial multi-armed bandit and its extension to probabilistically triggered arms,” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 1746–1778, 2016.
- S. Wang and W. Chen, “Thompson sampling for combinatorial semi-bandits,” in Proceedings of the 35th International Conference on Machine Learning. PMLR, Jul 2018, p. 5114–5122. [Online]. Available: https://proceedings.mlr.press/v80/wang18a.html
- Y. Gai, B. Krishnamachari, and R. Jain, “Combinatorial network optimization with unknown variables: Multi-armed bandits with linear rewards and individual observations,” IEEE/ACM Transactions on Networking, vol. 20, no. 5, p. 1466–1478, Oct 2012.
- B. Kveton, Z. Wen, A. Ashkan, and C. Szepesvari, “Tight regret bounds for stochastic combinatorial semi-bandits,” in Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics. PMLR, Feb 2015, p. 535–543. [Online]. Available: https://proceedings.mlr.press/v38/kveton15.html
- J. Cortes, S. Martinez, T. Karatas, and F. Bullo, “Coverage control for mobile sensing networks,” IEEE Transactions on robotics and Automation, vol. 20, no. 2, pp. 243–255, 2004.
- V. Ramaswamy and J. R. Marden, “A sensor coverage game with improved efficiency guarantees,” in 2016 American Control Conference (ACC). IEEE, 2016, pp. 6399–6404.
- X. Sun, C. G. Cassandras, and X. Meng, “A submodularity-based approach for multi-agent optimal coverage problems,” in 2017 IEEE 56th Annual Conference on Decision and Control (CDC). IEEE, 2017, pp. 4082–4087.
- M. Prajapat, M. Turchetta, M. Zeilinger, and A. Krause, “Near-optimal multi-agent learning for safe coverage control,” Advances in Neural Information Processing Systems, vol. 35, pp. 14 998–15 012, 2022.
- V. Ramaswamy, D. Paccagnan, and J. R. Marden, “Multiagent maximum coverage problems: The tradeoff between anarchy and stability,” IEEE Transactions on Automatic Control, vol. 67, no. 4, pp. 1698–1712, 2021.
- T. Jaksch, R. Ortner, and P. Auer, “Near-optimal regret bounds for reinforcement learning,” Journal of Machine Learning Research, vol. 11, no. 51, pp. 1563–1600, 2010. [Online]. Available: http://jmlr.org/papers/v11/jaksch10a.html
- Z. Galil, “Efficient algorithms for finding maximum matching in graphs,” ACM Comput. Surv., vol. 18, no. 1, p. 23–38, mar 1986. [Online]. Available: https://doi.org/10.1145/6462.6502