Policy Optimization finds Nash Equilibrium in Regularized General-Sum LQ Games (2404.00045v2)
Abstract: In this paper, we investigate the impact of introducing relative entropy regularization on the Nash Equilibria (NE) of General-Sum $N$-agent games, revealing the fact that the NE of such games conform to linear Gaussian policies. Moreover, it delineates sufficient conditions, contingent upon the adequacy of entropy regularization, for the uniqueness of the NE within the game. As Policy Optimization serves as a foundational approach for Reinforcement Learning (RL) techniques aimed at finding the NE, in this work we prove the linear convergence of a policy optimization algorithm which (subject to the adequacy of entropy regularization) is capable of provably attaining the NE. Furthermore, in scenarios where the entropy regularization proves insufficient, we present a $\delta$-augmentation technique, which facilitates the achievement of an $\epsilon$-NE within the game.
- J. Jin, C. Song, H. Li, K. Gai, J. Wang, and W. Zhang, “Real-time bidding with multi-agent reinforcement learning in display advertising,” in Proceedings of the 27th ACM International Conference on Information and Knowledge Management, ser. CIKM ’18. ACM, Oct. 2018. [Online]. Available: http://dx.doi.org/10.1145/3269206.3272021
- S. Shalev-Shwartz, S. Shammah, and A. Shashua, “Safe, multi-agent, reinforcement learning for autonomous driving,” available on arXiv:1610.03295, 2016.
- T. B. de Oliveira, A. L. Bazzan, B. C. da Silva, and R. Grunitzki, “Comparing multi-armed bandit algorithms and q-learning for multiagent action selection: a case study in route choice,” in 2018 International Joint Conference on Neural Networks (IJCNN). IEEE, 2018, pp. 1–8.
- C. Yu, J. Liu, S. Nemati, and G. Yin, “Reinforcement learning in healthcare: A survey,” ACM Computing Surveys (CSUR), vol. 55, no. 1, pp. 1–36, 2021.
- A. Charpentier, R. Elie, and C. Remlinger, “Reinforcement learning in economics and finance,” Computational Economics, pp. 1–38, 2021.
- Y. Min, J. He, T. Wang, and Q. Gu, “Cooperative multi-agent reinforcement learning: Asynchronous communication and linear function approximation,” in International Conference on Machine Learning. PMLR, 2023, pp. 24 785–24 811.
- C. Jin, Q. Liu, Y. Wang, and T. Yu, “V-learning–a simple, efficient, decentralized algorithm for multiagent RL,” available on arXiv:2110.14555, 2021.
- Z. Song, S. Mei, and Y. Bai, “When can we learn general-sum Markov games with a large number of players sample-efficiently?” available on arXiv:2110.04184, 2021.
- M. Cheng, R. Zhou, P. Kumar, and C. Tian, “Provable policy gradient methods for average-reward Markov potential games,” available on arXiv:2403.05738, 2024.
- B. H. Zhang, G. Farina, A. Celli, and T. Sandholm, “Optimal correlated equilibria in general-sum extensive-form games: Fixed-parameter algorithms, hardness, and two-sided column-generation,” in Proceedings of the 23rd ACM Conference on Economics and Computation, 2022, pp. 1119–1120.
- W. Mao and T. Başar, “Provably efficient reinforcement learning in decentralized general-sum markov games,” Dynamic Games and Applications, vol. 13, no. 1, pp. 165–186, 2023.
- K. Zhang, Z. Yang, and T. Başar, “Multi-agent reinforcement learning: A selective overview of theories and algorithms,” In Vamvoudakis et. al. (eds.) Handbook of reinforcement learning and control, Studies in Systems, Decision and Control 325, Springer Nature Switzerland AG, pp. 321–384, 2021.
- K. Cui and H. Koeppl, “Approximately solving mean field games via entropy-regularized deep reinforcement learning,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2021, pp. 1909–1917.
- A. Angiuli, J.-P. Fouque, and M. Laurière, “Unified reinforcement Q-learning for mean field game and control problems,” Mathematics of Control, Signals, and Systems, vol. 34, no. 2, pp. 217–271, 2022.
- M. A. U. Zaman, A. Koppel, S. Bhatt, and T. Başar, “Oracle-free reinforcement learning in mean-field games along a single sample path,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2023, pp. 10 178–10 206.
- E. Mazumdar, L. J. Ratliff, M. I. Jordan, and S. S. Sastry, “Policy-gradient algorithms have no guarantees of convergence in continuous action and state multi-agent settings,” available on arXiv:1907.03712, 2019.
- B. Hambly, R. Xu, and H. Yang, “Policy gradient methods find the Nash equilibrium in N-player general-sum linear-quadratic games,” Journal of Machine Learning Research, vol. 24, no. 139, 2023.
- M. A. uz Zaman, A. Koppel, M. Laurière, and T. Başar, “Independent RL for cooperative-competitive agents: A mean-field perspective,” available on arxiv:2403.11345, 2024.
- M. Fazel, R. Ge, S. Kakade, and M. Mesbahi, “Global convergence of policy gradient methods for the linear quadratic regulator,” in International Conference on Machine Learning. PMLR, 2018, pp. 1467–1476.
- J. Mei, B. Dai, C. Xiao, C. Szepesvari, and D. Schuurmans, “Understanding the effect of stochasticity in policy optimization,” Advances in Neural Information Processing Systems, vol. 34, pp. 19 339–19 351, 2021.
- J. Bu, L. J. Ratliff, and M. Mesbahi, “Global convergence of policy gradient for sequential zero-sum linear quadratic dynamic games,” arXiv preprint arXiv:1911.04672, 2019.
- K. Zhang, Z. Yang, and T. Başar, “Policy optimization provably converges to Nash equilibria in zero-sum linear quadratic games,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- M. Roudneshin, J. Arabneydi, and A. G. Aghdam, “Reinforcement learning in nonzero-sum linear quadratic deep structured games: Global convergence of policy optimization,” in In Proceedings of the 59th IEEE Conference on Decision and Control (CDC). IEEE, 2020, pp. 512–517.
- J. Lidard, H. Hu, A. Hancock, Z. Zhang, A. G. Contreras, V. Modi, J. DeCastro, D. Gopinath, G. Rosman, N. Leonard et al., “Blending data-driven priors in dynamic games,” arXiv preprint arXiv:2402.14174, 2024.
- H. Wang, T. Zariphopoulou, and X. Zhou, “Exploration versus exploitation in reinforcement learning: A stochastic control approach,” available on arXiv:1812.01552, 2018.
- H. Wang and X. Y. Zhou, “Continuous-time mean–variance portfolio selection: A reinforcement learning framework,” Mathematical Finance, vol. 30, no. 4, pp. 1273–1308, 2020.
- X. Guo, X. Li, and R. Xu, “Fast policy learning for linear quadratic control with entropy regularization,” available on arXiv:2311.14168, 2023.
- X. Guo, R. Xu, and T. Zariphopoulou, “Entropy regularization for mean field games with learning,” Mathematics of Operations Research, vol. 47, no. 4, pp. 3239–3260, 2022.
- D. Firoozi and S. Jaimungal, “Exploratory LQG mean field games with entropy regularization,” Automatica, vol. 139, p. 110177, 2022.
- J.-M. Lasry and P.-L. Lions, “Mean field games,” Japanese Journal of Mathematics, vol. 2, no. 1, pp. 229–260, 2007.
- M. Huang, R. P. Malhamé, and P. E. Caines, “Large population stochastic dynamic games: Closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle,” Communications in Information & Systems, vol. 6, no. 3, pp. 221–252, 2006.
- S. Aggarwal, M. A. uz Zaman, and T. Başar, “Linear quadratic mean-field games with communication constraints,” in Proceedings of the American Control Conference (ACC). IEEE, 2022, pp. 1323–1329.
- C. P. Robert, “Intrinsic losses,” Theory and Decision, vol. 40, pp. 191–214, 1996.
- X. Zhang and T. Başar, “Revisiting LQR control from the perspective of receding-horizon policy gradient,” IEEE Control Systems Letters, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.