Distributed Policy Gradient for Linear Quadratic Networked Control with Limited Communication Range (2403.03055v1)
Abstract: This paper proposes a scalable distributed policy gradient method and proves its convergence to near-optimal solution in multi-agent linear quadratic networked systems. The agents engage within a specified network under local communication constraints, implying that each agent can only exchange information with a limited number of neighboring agents. On the underlying graph of the network, each agent implements its control input depending on its nearby neighbors' states in the linear quadratic control setting. We show that it is possible to approximate the exact gradient only using local information. Compared with the centralized optimal controller, the performance gap decreases to zero exponentially as the communication and control ranges increase. We also demonstrate how increasing the communication range enhances system stability in the gradient descent process, thereby elucidating a critical trade-off. The simulation results verify our theoretical findings.
- A. S. Berahas, R. Bollapragada, and E. Wei, “On the convergence of nested decentralized gradient methods with multiple consensus and gradient steps,” IEEE Trans. Signal Process., vol. 69, pp. 4192–4203, 2021.
- T.-H. Chang, M. Hong, and X. Wang, “Multi-agent distributed optimization via inexact consensus admm,” IEEE Trans. Signal Process., vol. 63, no. 2, pp. 482–497, 2015.
- K. Gu, Y. Wang, and Y. Shen, “Cooperative detection by multi-agent networks in the presence of position uncertainty,” IEEE Trans. Signal Process., vol. 68, pp. 5411–5426, 2020.
- ——, “A quasi-coherent detection framework for mobile multi-agent networks,” IEEE Trans. Signal Process., vol. 69, pp. 6416–6430, 2021.
- D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot et al., “Mastering the game of go with deep neural networks and tree search,” nature, vol. 529, no. 7587, pp. 484–489, 2016.
- Y. Yan, X. Li, X. Qiu, J. Qiu, J. Wang, Y. Wang, and Y. Shen, “Relative distributed formation and obstacle avoidance with multi-agent reinforcement learning,” in Proc. IEEE Int. Conf. Robot. Autom. IEEE, 2022, pp. 1661–1667.
- C. Pan, Y. Yan, Z. Zhang, and Y. Shen, “Flexible formation control using hausdorff distance: A multi-agent reinforcement learning approach,” in Proc. Eur. Signal Process. Conf. IEEE, 2022, pp. 972–976.
- Y. Yan, Y. Dong, K. Ma, and Y. Shen, “Approximation error back-propagation for q-function in scalable reinforcement learning with tree dependence structure,” in Proc. IEEE Int. Conf. Acoust., Speech and Signal Process. IEEE, 2023, pp. 1–5.
- X. Shen, L. Xu, Y. Liu, and Y. Shen, “A theoretical framework for relative localization,” IEEE Transactions on Information Theory, 2023.
- X. Chen, G. Qu, Y. Tang, S. Low, and N. Li, “Reinforcement learning for selective key applications in power systems: Recent advances and future challenges,” IEEE Trans. Smart Grid, 2022.
- K. Zhang, Z. Yang, H. Liu, T. Zhang, and T. Basar, “Fully decentralized multi-agent reinforcement learning with networked agents,” in Proc. Int. Conf. Mach. Learn. PMLR, 2018, pp. 5872–5881.
- K. Zhang, Z. Yang, and T. Başar, “Multi-agent reinforcement learning: A selective overview of theories and algorithms,” Handbook of Reinforcement Learning and Control, pp. 321–384, 2021.
- Y. Tian, K. Zhang, R. Tedrake, and S. Sra, “Can direct latent model learning solve linear quadratic gaussian control?” arXiv preprint arXiv:2212.14511, 2022.
- K. Zhang, B. Hu, and T. Basar, “On the stability and convergence of robust adversarial reinforcement learning: A case study on linear quadratic systems,” Proc. Advances in Neural Inf. Process. Syst., vol. 33, pp. 22 056–22 068, 2020.
- K. Zhang, Z. Yang, and T. Basar, “Policy optimization provably converges to nash equilibria in zero-sum linear quadratic games,” Proc. Advances in Neural Inf. Process. Syst., vol. 32, 2019.
- M. Fazel, R. Ge, S. Kakade, and M. Mesbahi, “Global convergence of policy gradient methods for the linear quadratic regulator,” in Proc. Int. Conf. Mach. Learn. PMLR, 2018, pp. 1467–1476.
- L. Bakule, “Decentralized control: An overview,” Annual reviews in control, vol. 32, no. 1, pp. 87–98, 2008.
- A. L. Bazzan, “Opportunities for multiagent systems and multiagent reinforcement learning in traffic control,” Autonomous Agents and Multi-Agent Systems, vol. 18, no. 3, pp. 342–375, 2009.
- M. Pipattanasomporn, H. Feroze, and S. Rahman, “Multi-agent systems in a distributed smart grid: Design and implementation,” in 2009 IEEE/PES Power Systems Conference and Exposition. IEEE, 2009, pp. 1–8.
- J. Bu, A. Mesbahi, M. Fazel, and M. Mesbahi, “Lqr through the lens of first order methods: Discrete-time case,” arXiv preprint arXiv:1907.08921, 2019.
- Y. Li, Y. Tang, R. Zhang, and N. Li, “Distributed reinforcement learning for decentralized linear quadratic control: A derivative-free policy optimization approach,” IEEE Trans. Auto. Cont., vol. 67, no. 12, pp. 6429–6444, 2021.
- L. Ljung et al., “Theory for the user,” System Identification, 1987.
- S. Dean, H. Mania, N. Matni, B. Recht, and S. Tu, “On the sample complexity of the linear quadratic regulator,” Foundations of Computational Mathematics, vol. 20, no. 4, pp. 633–679, 2020.
- S. Tu and B. Recht, “Least-squares temporal difference learning for the linear quadratic regulator,” in Proc. Int. Conf. Mach. Learn. PMLR, 2018, pp. 5005–5014.
- M. Simchowitz, H. Mania, S. Tu, M. I. Jordan, and B. Recht, “Learning without mixing: Towards a sharp analysis of linear system identification,” in Conf. Learn. Theo. PMLR, 2018, pp. 439–473.
- Z. Yang, Y. Chen, M. Hong, and Z. Wang, “Provably global convergence of actor-critic: A case for linear quadratic regulator with ergodic cost,” Proc. Advances in Neural Inf. Process. Syst., vol. 32, 2019.
- K. Zhang, B. Hu, and T. Basar, “Policy optimization for h2subscriptℎ2h_{2}italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT linear control with h∞subscriptℎh_{\infty}italic_h start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT robustness guarantee: Implicit regularization and global convergence,” SIAM Journal on Control and Optimization, vol. 59, no. 6, pp. 4081–4109, 2021.
- X. Guo and B. Hu, “Global convergence of direct policy search for state-feedback h robust control: A revisit of nonsmooth synthesis with goldstein subdifferential,” in 36th Conference on Neural Information Processing Systems, New Orleans, LA, Nov, vol. 28, 2022.
- B. Hu, K. Zhang, N. Li, M. Mesbahi, M. Fazel, and T. Başar, “Toward a theoretical foundation of policy optimization for learning control policies,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 6, pp. 123–158, 2023.
- M. Rotkowitz and S. Lall, “A characterization of convex problems in decentralized control,” IEEE transactions on Automatic Control, vol. 50, no. 12, pp. 1984–1996, 2005.
- M. Gagrani and A. Nayyar, “Thompson sampling for some decentralized control problems,” in 2018 Conf. Deci. Cont. IEEE, 2018, pp. 1053–1058.
- S. Shin, Y. Lin, G. Qu, A. Wierman, and M. Anitescu, “Near-optimal distributed linear-quadratic regulator for networked systems,” arXiv preprint arXiv:2204.05551, 2022.
- R. Zhang, W. Li, and N. Li, “On the optimal control of network lqr with spatially-exponential decaying structure,” arXiv preprint arXiv:2209.14376, 2022.
- D. S. Bernstein, R. Givan, N. Immerman, and S. Zilberstein, “The complexity of decentralized control of markov decision processes,” Mathematics of operations research, vol. 27, no. 4, pp. 819–840, 2002.
- R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, “Policy gradient methods for reinforcement learning with function approximation,” Proc. Advances in Neural Inf. Process. Syst., vol. 12, 1999.
- D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, “Deterministic policy gradient algorithms,” in Proc. Int. Conf. Mach. Learn. PMLR, 2014, pp. 387–395.
- G. Qu and N. Li, “Exploiting fast decaying and locality in multi-agent mdp with tree dependence structure,” in Conf. Deci. Cont. IEEE, 2019, pp. 6479–6486.
- G. Qu, Y. Lin, A. Wierman, and N. Li, “Scalable multi-agent reinforcement learning for networked systems with average reward,” Proc. Advances in Neural Inf. Process. Syst., vol. 33, pp. 2074–2086, 2020.
- G. Qu, A. Wierman, and N. Li, “Scalable reinforcement learning of localized policies for multi-agent networked systems,” in Learn. Dyna. Cont. PMLR, 2020, pp. 256–266.
- C. Alfano and P. Rebeschini, “Dimension-free rates for natural policy gradient in multi-agent reinforcement learning,” arXiv preprint arXiv:2109.11692, 2021.
- Y. Zhang, G. Qu, P. Xu, Y. Lin, Z. Chen, and A. Wierman, “Global convergence of localized policy iteration in networked multi-agent reinforcement learning,” arXiv preprint arXiv:2211.17116, 2022.
- Y. Yan and Y. Shen, “Supplementary material: Distributed policy gradient for linear quadratic networked control with limited communication range,” https://github.com/Chaojidahoufeng/TSP_near_optimality/blob/main/supplementary_material.pdf.
- K. Zhang, X. Zhang, B. Hu, and T. Basar, “Derivative-free policy optimization for linear risk-sensitive and robust control design: Implicit regularization and sample complexity,” Advances in Neural Information Processing Systems, vol. 34, pp. 2949–2964, 2021.
- S. Cen, C. Cheng, Y. Chen, Y. Wei, and Y. Chi, “Fast global convergence of natural policy gradient methods with entropy regularization,” Operations Research, vol. 70, no. 4, pp. 2563–2578, 2022.