Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Robust Multi-Agent Control via Maximum Entropy Heterogeneous-Agent Reinforcement Learning (2306.10715v5)

Published 19 Jun 2023 in cs.MA and cs.LG

Abstract: In multi-agent reinforcement learning, optimal control with robustness guarantees are critical for its deployment in real world. However, existing methods face challenges related to sample complexity, training instability, potential suboptimal Nash Equilibrium convergence and non-robustness to multiple perturbations. In this paper, we propose a unified framework for learning \emph{stochastic} policies to resolve these issues. We embed cooperative MARL problems into probabilistic graphical models, from which we derive the maximum entropy (MaxEnt) objective optimal for MARL. Based on the MaxEnt framework, we propose \emph{Heterogeneous-Agent Soft Actor-Critic} (HASAC) algorithm. Theoretically, we prove the monotonic improvement and convergence to \emph{quantal response equilibrium} (QRE) properties of HASAC. Furthermore, HASAC is provably robust against a wide range of real-world uncertainties, including perturbations in rewards, environment dynamics, states, and actions. Finally, we generalize a unified template for MaxEnt algorithmic design named \emph{Maximum Entropy Heterogeneous-Agent Mirror Learning} (MEHAML), which provides any induced method with the same guarantees as HASAC. We evaluate HASAC on seven benchmarks: Bi-DexHands, Multi-Agent MuJoCo, Pursuit-Evade, StarCraft Multi-Agent Challenge, Google Research Football, Multi-Agent Particle Environment, Light Aircraft Game. Results show that HASAC consistently outperforms strong baselines in 34 out of 38 tasks, exhibiting improved training stability, better sample efficiency and sufficient exploration. The robustness of HASAC was further validated when encountering uncertainties in rewards, dynamics, states, and actions of 14 magnitudes, and real-world deployment in a multi-robot arena against these four types of uncertainties. See our page at \url{https://sites.google.com/view/meharl}.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (15)
  1. Jiarong Liu (3 papers)
  2. Yifan Zhong (13 papers)
  3. Siyi Hu (21 papers)
  4. Haobo Fu (14 papers)
  5. Qiang Fu (159 papers)
  6. Xiaojun Chang (148 papers)
  7. Yaodong Yang (169 papers)
  8. Simin Li (14 papers)
  9. Jianing Guo (1 paper)
  10. Siyuan Qi (34 papers)
  11. Ruixiao Xu (3 papers)
  12. Xin Yu (192 papers)
  13. Yujing Hu (28 papers)
  14. Bo An (127 papers)
  15. Xianglong Liu (128 papers)
Citations (1)
Youtube Logo Streamline Icon: https://streamlinehq.com