Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Cooperative Multi-Agent Graph Bandits: UCB Algorithm and Regret Analysis (2401.10383v2)

Published 18 Jan 2024 in cs.LG, cs.MA, and stat.ML

Abstract: In this paper, we formulate the multi-agent graph bandit problem as a multi-agent extension of the graph bandit problem introduced by Zhang, Johansson, and Li [CISS 57, 1-6 (2023)]. In our formulation, $N$ cooperative agents travel on a connected graph $G$ with $K$ nodes. Upon arrival at each node, agents observe a random reward drawn from a node-dependent probability distribution. The reward of the system is modeled as a weighted sum of the rewards the agents observe, where the weights capture some transformation of the reward associated with multiple agents sampling the same node at the same time. We propose an Upper Confidence Bound (UCB)-based learning algorithm, Multi-G-UCB, and prove that its expected regret over $T$ steps is bounded by $O(\gamma N\log(T)[\sqrt{KT} + DK])$, where $D$ is the diameter of graph $G$ and $\gamma$ a boundedness parameter associated with the weight functions. Lastly, we numerically test our algorithm by comparing it to alternative methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. T. Zhang, K. Johansson, and N. Li, “Multi-armed bandit learning on a graph,” in 2023 57th Annual Conference on Information Sciences and Systems (CISS).   IEEE, 2023, pp. 1–6.
  2. A. Slivkins et al., “Introduction to multi-armed bandits,” Foundations and Trends® in Machine Learning, vol. 12, no. 1-2, pp. 1–286, 2019.
  3. S. Bubeck, N. Cesa-Bianchi, et al., “Regret analysis of stochastic and nonstochastic multi-armed bandit problems,” Foundations and Trends® in Machine Learning, vol. 5, no. 1, pp. 1–122, 2012.
  4. J. Zhu, R. Sandhu, and J. Liu, “A distributed algorithm for sequential decision making in multi-armed bandit with homogeneous rewards,” in 2020 59th IEEE Conference on Decision and Control (CDC).   Jeju, Korea (South): IEEE, Dec 2020, p. 3078–3083. [Online]. Available: https://ieeexplore.ieee.org/document/9303836/
  5. M. Chakraborty, K. Y. P. Chua, S. Das, and B. Juba, “Coordinated versus decentralized exploration in multi-agent multi-armed bandits,” in Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence.   Melbourne, Australia: International Joint Conferences on Artificial Intelligence Organization, 2017, p. 164–170. [Online]. Available: https://www.ijcai.org/proceedings/2017/24
  6. D. Martínez-Rubio, V. Kanade, and P. Rebeschini, “Decentralized cooperative stochastic bandits,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  7. P.-A. Wang, A. Proutiere, K. Ariu, Y. Jedra, and A. Russo, “Optimal algorithms for multiplayer multi-armed bandits,” in International Conference on Artificial Intelligence and Statistics.   PMLR, 2020, pp. 4120–4129.
  8. P. Landgren, V. Srivastava, and N. E. Leonard, “Distributed cooperative decision-making in multiarmed bandits: Frequentist and bayesian algorithms,” in 2016 IEEE 55th Conference on Decision and Control (CDC).   IEEE, 2016, pp. 167–172.
  9. M. Agarwal, V. Aggarwal, and K. Azizzadenesheli, “Multi-agent multi-armed bandits with limited communication,” The Journal of Machine Learning Research, vol. 23, no. 1, pp. 9529–9552, 2022.
  10. A. Sankararaman, A. Ganesh, and S. Shakkottai, “Social learning in multi agent multi armed bandits,” Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 3, no. 3, pp. 1–35, 2019.
  11. R. Chawla, A. Sankararaman, A. Ganesh, and S. Shakkottai, “The gossiping insert-eliminate algorithm for multi-agent bandits,” in International conference on artificial intelligence and statistics.   PMLR, 2020, pp. 3471–3481.
  12. D. Kalathil, N. Nayyar, and R. Jain, “Decentralized learning for multi-player multi-armed bandits,” in 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), Dec 2012, p. 3960–3965, arXiv:1206.3582 [cs, math]. [Online]. Available: http://arxiv.org/abs/1206.3582
  13. K. Liu and Q. Zhao, “Distributed learning in multi-armed bandit with multiple players,” IEEE Transactions on Signal Processing, vol. 58, no. 11, p. 5667–5681, Nov 2010.
  14. P.-A. Wang, A. Proutiere, K. Ariu, Y. Jedra, and A. Russo, “Optimal algorithms for multiplayer multi-armed bandits,” in Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics.   PMLR, Jun 2020, p. 4120–4129. [Online]. Available: https://proceedings.mlr.press/v108/wang20m.html
  15. E. Boursier and V. Perchet, “Sic-mmab: Synchronisation involves communication in multiplayer multi-armed bandits,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  16. W. Chen, Y. Wang, and Y. Yuan, “Combinatorial multi-armed bandit: General framework and applications,” in Proceedings of the 30th International Conference on Machine Learning.   PMLR, Feb 2013, p. 151–159. [Online]. Available: https://proceedings.mlr.press/v28/chen13a.html
  17. W. Chen, Y. Wang, Y. Yuan, and Q. Wang, “Combinatorial multi-armed bandit and its extension to probabilistically triggered arms,” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 1746–1778, 2016.
  18. S. Wang and W. Chen, “Thompson sampling for combinatorial semi-bandits,” in Proceedings of the 35th International Conference on Machine Learning.   PMLR, Jul 2018, p. 5114–5122. [Online]. Available: https://proceedings.mlr.press/v80/wang18a.html
  19. Y. Gai, B. Krishnamachari, and R. Jain, “Combinatorial network optimization with unknown variables: Multi-armed bandits with linear rewards and individual observations,” IEEE/ACM Transactions on Networking, vol. 20, no. 5, p. 1466–1478, Oct 2012.
  20. B. Kveton, Z. Wen, A. Ashkan, and C. Szepesvari, “Tight regret bounds for stochastic combinatorial semi-bandits,” in Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics.   PMLR, Feb 2015, p. 535–543. [Online]. Available: https://proceedings.mlr.press/v38/kveton15.html
  21. J. Cortes, S. Martinez, T. Karatas, and F. Bullo, “Coverage control for mobile sensing networks,” IEEE Transactions on robotics and Automation, vol. 20, no. 2, pp. 243–255, 2004.
  22. V. Ramaswamy and J. R. Marden, “A sensor coverage game with improved efficiency guarantees,” in 2016 American Control Conference (ACC).   IEEE, 2016, pp. 6399–6404.
  23. X. Sun, C. G. Cassandras, and X. Meng, “A submodularity-based approach for multi-agent optimal coverage problems,” in 2017 IEEE 56th Annual Conference on Decision and Control (CDC).   IEEE, 2017, pp. 4082–4087.
  24. M. Prajapat, M. Turchetta, M. Zeilinger, and A. Krause, “Near-optimal multi-agent learning for safe coverage control,” Advances in Neural Information Processing Systems, vol. 35, pp. 14 998–15 012, 2022.
  25. V. Ramaswamy, D. Paccagnan, and J. R. Marden, “Multiagent maximum coverage problems: The tradeoff between anarchy and stability,” IEEE Transactions on Automatic Control, vol. 67, no. 4, pp. 1698–1712, 2021.
  26. T. Jaksch, R. Ortner, and P. Auer, “Near-optimal regret bounds for reinforcement learning,” Journal of Machine Learning Research, vol. 11, no. 51, pp. 1563–1600, 2010. [Online]. Available: http://jmlr.org/papers/v11/jaksch10a.html
  27. Z. Galil, “Efficient algorithms for finding maximum matching in graphs,” ACM Comput. Surv., vol. 18, no. 1, p. 23–38, mar 1986. [Online]. Available: https://doi.org/10.1145/6462.6502

Summary

We haven't generated a summary for this paper yet.