Distributed Multi-Agent Reinforcement Learning Based on Graph-Induced Local Value Functions (2202.13046v5)
Abstract: Achieving distributed reinforcement learning (RL) for large-scale cooperative multi-agent systems (MASs) is challenging because: (i) each agent has access to only limited information; (ii) issues on convergence or computational complexity emerge due to the curse of dimensionality. In this paper, we propose a general computationally efficient distributed framework for cooperative multi-agent reinforcement learning (MARL) by utilizing the structures of graphs involved in this problem. We introduce three coupling graphs describing three types of inter-agent couplings in MARL, namely, the state graph, the observation graph and the reward graph. By further considering a communication graph, we propose two distributed RL approaches based on local value-functions derived from the coupling graphs. The first approach is able to reduce sample complexity significantly under specific conditions on the aforementioned four graphs. The second approach provides an approximate solution and can be efficient even for problems with dense coupling graphs. Here there is a trade-off between minimizing the approximation error and reducing the computational complexity. Simulations show that our RL algorithms have a significantly improved scalability to large-scale MASs compared with centralized and consensus-based distributed RL algorithms.
- D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton et al., “Mastering the game of go without human knowledge,” nature, vol. 550, no. 7676, pp. 354–359, 2017.
- A. S. Polydoros and L. Nalpantidis, “Survey of model-based reinforcement learning: Applications on robotics,” Journal of Intelligent & Robotic Systems, vol. 86, no. 2, pp. 153–173, 2017.
- S. Mukherjee, A. Chakrabortty, H. Bai, A. Darvishi, and B. Fardanesh, “Scalable designs for reinforcement learning-based wide-area damping control,” IEEE Transactions on Smart Grid, vol. 12, no. 3, pp. 2389–2401, 2021.
- A. OroojlooyJadid and D. Hajinezhad, “A review of cooperative multi-agent deep reinforcement learning,” arXiv preprint arXiv:1908.03963, 2019.
- S. Gronauer and K. Diepold, “Multi-agent deep reinforcement learning: a survey,” Artificial Intelligence Review, pp. 1–49, 2021.
- K. Zhang, Z. Yang, and T. Başar, “Multi-agent reinforcement learning: A selective overview of theories and algorithms,” Handbook of Reinforcement Learning and Control, pp. 321–384, 2021.
- G. Qu, A. Wierman, and N. Li, “Scalable reinforcement learning of localized policies for multi-agent networked systems,” in Learning for Dynamics and Control. PMLR, 2020, pp. 256–266.
- Y. Lin, G. Qu, L. Huang, and A. Wierman, “Multi-agent reinforcement learning in stochastic networked systems,” in Thirty-Fifth Conference on Neural Information Processing Systems, 2021.
- S. Omidshafiei, J. Pazis, C. Amato, J. P. How, and J. Vian, “Deep decentralized multi-task multi-agent reinforcement learning under partial observability,” in International Conference on Machine Learning. PMLR, 2017, pp. 2681–2690.
- S. Nayak, K. Choi, W. Ding, S. Dolan, K. Gopalakrishnan, and H. Balakrishnan, “Scalable multi-agent reinforcement learning through intelligent information aggregation,” in International Conference on Machine Learning. PMLR, 2023, pp. 25 817–25 833.
- Y. Zhang and M. M. Zavlanos, “Cooperative multi-agent reinforcement learning with partial observations,” IEEE Transactions on Automatic Control, 2023, doi: 10.1109/TAC.2023.3288025.
- K. Zhang, Z. Yang, H. Liu, T. Zhang, and T. Basar, “Fully decentralized multi-agent reinforcement learning with networked agents,” in International Conference on Machine Learning. PMLR, 2018, pp. 5872–5881.
- C. Guestrin, M. Lagoudakis, and R. Parr, “Coordinated reinforcement learning,” in ICML, vol. 2. Citeseer, 2002, pp. 227–234.
- J. R. Kok and N. Vlassis, “Collaborative multiagent reinforcement learning by payoff propagation,” Journal of Machine Learning Research, vol. 7, pp. 1789–1828, 2006.
- G. Jing, H. Bai, J. George, A. Chakrabortty, and P. K. Sharma, “Asynchronous distributed reinforcement learning for LQR control via zeroth-order block coordinate descent,” arXiv preprint arXiv:2107.12416, 2021.
- Y. Li, Y. Tang, R. Zhang, and N. Li, “Distributed reinforcement learning for decentralized linear quadratic control: A derivative-free policy optimization approach,” IEEE Transactions on Automatic Control, 2021.
- D. Görges, “Distributed adaptive linear quadratic control using distributed reinforcement learning,” IFAC-PapersOnLine, vol. 52, no. 11, pp. 218–223, 2019.
- G. Jing, H. Bai, J. George, and A. Chakrabortty, “Model-free optimal control of linear multi-agent systems via decomposition and hierarchical approximation,” IEEE Transactions on Control of Network Systems, 2021.
- M. L. Littman, “Markov games as a framework for multi-agent reinforcement learning,” in Machine learning proceedings. Elsevier, 1994, pp. 157–163.
- S. Kar, J. M. Moura, and H. V. Poor, “𝒬𝒟𝒬𝒟{{\cal Q}{\cal D}}caligraphic_Q caligraphic_D-learning: A collaborative distributed strategy for multi-agent reinforcement learning through consensus+innovationsconsensusinnovations{\rm consensus}+{\rm innovations}roman_consensus + roman_innovations,” IEEE Transactions on Signal Processing, vol. 61, no. 7, pp. 1848–1862, 2013.
- S. V. Macua, A. Tukiainen, D. G.-O. Hernández, D. Baldazo, E. M. de Cote, and S. Zazo, “Diff-dac: Distributed actor-critic for average multitask deep reinforcement learning,” in Adaptive Learning Agents (ALA) Conference, 2018.
- R. S. Sutton, J. Modayil, M. Delp, T. Degris, P. M. Pilarski, A. White, and D. Precup, “Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction,” in The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 2, 2011, pp. 761–768.
- A. Olshevsky, “Linear time average consensus on fixed graphs and implications for decentralized optimization and multi-agent control,” arXiv preprint arXiv:1411.4186, 2014.
- G. Jing, H. Bai, J. George, A. Chakrabortty, and P. K. Sharma, “Distributed cooperative multi-agent reinforcement learning with directed coordination graph,” in 2022 American Control Conference (ACC), to appear. IEEE, 2022.
- G. Qu, Y. Lin, A. Wierman, and N. Li, “Scalable multi-agent reinforcement learning for networked systems with average reward,” Advances in Neural Information Processing Systems, vol. 33, 2020.
- P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V. Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls et al., “Value-decomposition networks for cooperative multi-agent learning,” arXiv preprint arXiv:1706.05296, 2017.
- T. Zhang, Y. Li, C. Wang, G. Xie, and Z. Lu, “Fop: Factorizing optimal joint policy of maximum-entropy multi-agent reinforcement learning,” in International Conference on Machine Learning. PMLR, 2021, pp. 12 491–12 500.
- M. Fazel, R. Ge, S. Kakade, and M. Mesbahi, “Global convergence of policy gradient methods for the linear quadratic regulator,” in International Conference on Machine Learning. PMLR, 2018, pp. 1467–1476.
- D. Malik, A. Pananjady, K. Bhatia, K. Khamaru, P. L. Bartlett, and M. J. Wainwright, “Derivative-free methods for policy optimization: Guarantees for linear quadratic systems,” Journal of Machine Learning Research, vol. 21, no. 21, pp. 1–51, 2020.
- D. Hajinezhad, M. Hong, and A. Garcia, “Zone: Zeroth-order nonconvex multiagent optimization over networks,” IEEE Transactions on Automatic Control, vol. 64, no. 10, pp. 3995–4010, 2019.
- C. Gratton, N. K. Venkategowda, R. Arablouei, and S. Werner, “Privacy-preserving distributed zeroth-order optimization,” arXiv preprint arXiv:2008.13468, 2020.
- Y. Tang, J. Zhang, and N. Li, “Distributed zero-order algorithms for nonconvex multi-agent optimization,” IEEE Transactions on Control of Network Systems, 2020.
- A. Akhavan, M. Pontil, and A. B. Tsybakov, “Distributed zero-order optimization under adversarial noise,” arXiv preprint arXiv:2102.01121, 2021.
- T. Chen, K. Zhang, G. B. Giannakis, and T. Basar, “Communication-efficient policy gradient methods for distributed reinforcement learning,” IEEE Transactions on Control of Network Systems, 2021.
- Y. Nesterov and V. Spokoiny, “Random gradient-free minimization of convex functions,” Foundations of Computational Mathematics, vol. 17, no. 2, pp. 527–566, 2017.
- M. Pirotta, M. Restelli, and L. Bascetta, “Policy gradient in lipschitz markov decision processes,” Machine Learning, vol. 100, no. 2, pp. 255–283, 2015.
- H. Kumar, D. S. Kalogerias, G. J. Pappas, and A. Ribeiro, “Zeroth-order deterministic policy gradient,” arXiv preprint arXiv:2006.07314, 2020.
- A. Vemula, W. Sun, and J. Bagnell, “Contrasting exploration in parameter and action space: A zeroth-order optimization perspective,” in The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 2019, pp. 2926–2935.
- L. Xiao and S. Boyd, “Fast linear iterations for distributed averaging,” Systems & Control Letters, vol. 53, no. 1, pp. 65–78, 2004.
- J. Bhandari and D. Russo, “Global optimality guarantees for policy gradient methods,” arXiv preprint arXiv:1906.01786, 2019.
- A. Agarwal, S. M. Kakade, J. D. Lee, and G. Mahajan, “On the theory of policy gradient methods: Optimality, approximation, and distribution shift.” Journal of Machine Learning Research, vol. 22, no. 98, pp. 1–76, 2021.
- H. Feng and J. Lavaei, “On the exponential number of connected components for the feasible set of optimal decentralized control problems,” in 2019 American Control Conference (ACC). IEEE, 2019, pp. 1430–1437.
- Y. Zhang, Y. Zhou, K. Ji, and M. M. Zavlanos, “A new one-point residual-feedback oracle for black-box learning and control,” Automatica, p. 110006, 2021.
- L. Xiao, S. Boyd, and S. Lall, “A scheme for robust distributed sensor fusion based on average consensus,” in IPSN 2005. Fourth International Symposium on Information Processing in Sensor Networks, 2005. IEEE, 2005, pp. 63–70.
- Gangshan Jing (15 papers)
- He Bai (50 papers)
- Jemin George (25 papers)
- Aranya Chakrabortty (40 papers)
- Piyush K. Sharma (11 papers)