DeepSafeMPC: Deep Learning-Based Model Predictive Control for Safe Multi-Agent Reinforcement Learning (2403.06397v2)
Abstract: Safe Multi-agent reinforcement learning (safe MARL) has increasingly gained attention in recent years, emphasizing the need for agents to not only optimize the global return but also adhere to safety requirements through behavioral constraints. Some recent work has integrated control theory with multi-agent reinforcement learning to address the challenge of ensuring safety. However, there have been only very limited applications of Model Predictive Control (MPC) methods in this domain, primarily due to the complex and implicit dynamics characteristic of multi-agent environments. To bridge this gap, we propose a novel method called Deep Learning-Based Model Predictive Control for Safe Multi-Agent Reinforcement Learning (DeepSafeMPC). The key insight of DeepSafeMPC is leveraging a entralized deep learning model to well predict environmental dynamics. Our method applies MARL principles to search for optimal solutions. Through the employment of MPC, the actions of agents can be restricted within safe states concurrently. We demonstrate the effectiveness of our approach using the Safe Multi-agent MuJoCo environment, showcasing significant advancements in addressing safety concerns in MARL.
- A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(2):156–172, 2008.
- A taxonomy for multi-agent robotics. Autonomous Robots, 3:375–397, 1996.
- Primal: Pathfinding via reinforcement and imitation multi-agent learning. IEEE Robotics and Automation Letters, 4(3):2378–2385, 2019.
- Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019.
- Towards optimally decentralized multi-robot collision avoidance via deep reinforcement learning. In 2018 IEEE international conference on robotics and automation (ICRA), pages 6252–6259. IEEE, 2018.
- A review of safe reinforcement learning: Methods, theory and applications, 2023.
- First order constrained optimization in policy space. Advances in Neural Information Processing Systems, 33:15338–15349, 2020.
- Safety gymnasium: A unified safe reinforcement learning benchmark. Advances in Neural Information Processing Systems, 36, 2023.
- Multi-agent constrained policy optimisation, 2022.
- Dealing with non-stationarity in multi-agent deep reinforcement learning, 2019.
- Safe and robust learning control with gaussian processes. In 2015 European Control Conference (ECC), pages 2496–2501. IEEE, 2015.
- Safe model-based reinforcement learning using robust control barrier functions. arXiv preprint arXiv:2110.05415, 2021.
- Reinforcement learning and optimal adaptive control: An overview and implementation examples. Annual reviews in control, 36(1):42–59, 2012.
- Safe reinforcement learning using robust mpc. IEEE Transactions on Automatic Control, 66(8):3638–3652, 2020.
- Constrained model predictive control: Stability and optimality. Automatica, 36(6):789–814, 2000.
- Deepmpc: Learning deep latent features for model predictive control. In Robotics: Science and Systems, volume 10, page 25. Rome, Italy, 2015.
- The surprising effectiveness of ppo in cooperative, multi-agent games, 2022.
- Adaptive control: stability, convergence, and robustness, 1990.
- A lyapunov-based approach to safe reinforcement learning. Advances in neural information processing systems, 31, 2018.
- Safe model-based reinforcement learning with stability guarantees, 2017.
- Lyapunov-based safe policy optimization for continuous control, 2019.
- Safe reinforcement learning with chance-constrained model predictive control. In Learning for Dynamics and Control Conference, pages 291–303. PMLR, 2022.
- Lstm-mpc: A deep learning based predictive control method for multimode process control. IEEE Transactions on Industrial Electronics, 2022.
- Learning model predictive control for iterative tasks. a data-driven control framework, 2017.
- A concise introduction to decentralized POMDPs. Springer, 2016.
- Proximal policy optimization algorithms, 2017.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor, 2018.
- High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438, 2015.
- Facmac: Factored multi-agent centralised policy gradients, 2021.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033, 2012.