Safe Multi-Agent Reinforcement Learning with Bilevel Optimization in Autonomous Driving (2405.18209v1)
Abstract: Ensuring safety in MARL, particularly when deploying it in real-world applications such as autonomous driving, emerges as a critical challenge. To address this challenge, traditional safe MARL methods extend MARL approaches to incorporate safety considerations, aiming to minimize safety risk values. However, these safe MARL algorithms often fail to model other agents and lack convergence guarantees, particularly in dynamically complex environments. In this study, we propose a safe MARL method grounded in a Stackelberg model with bi-level optimization, for which convergence analysis is provided. Derived from our theoretical analysis, we develop two practical algorithms, namely Constrained Stackelberg Q-learning (CSQ) and Constrained Stackelberg Multi-Agent Deep Deterministic Policy Gradient (CS-MADDPG), designed to facilitate MARL decision-making in autonomous driving applications. To evaluate the effectiveness of our algorithms, we developed a safe MARL autonomous driving benchmark and conducted experiments on challenging autonomous driving scenarios, such as merges, roundabouts, intersections, and racetracks. The experimental results indicate that our algorithms, CSQ and CS-MADDPG, outperform several strong MARL baselines, such as Bi-AC, MACPO, and MAPPO-L, regarding reward and safety performance. The demos and source code are available at {https://github.com/SafeRL-Lab/Safe-MARL-in-Autonomous-Driving.git}.
- Constrained policy optimization. In International conference on machine learning, pages 22–31. PMLR, 2017.
- E. Altman. Constrained Markov Decision Processes. Chapman and Hall, 1999.
- Openai gym. arXiv preprint arXiv:1606.01540, 2016.
- Safe multi-agent reinforcement learning through decentralized multiple control barrier functions. arXiv preprint arXiv:2103.12553, 2021.
- Provably efficient generalized lagrangian policy optimization for safe multi-agent reinforcement learning. In Learning for Dynamics and Control Conference, pages 315–332. PMLR, 2023.
- Safe multi-agent reinforcement learning via shielding. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, pages 483–491, 2021.
- A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(42):1437–1480, 2015.
- Constrained reinforcement learning for vehicle motion planning with topological reachability analysis. Robotics, 11(4):81, 2022.
- Safe multiagent learning with soft constrained policy optimization in real robot control. IEEE Transactions on Industrial Informatics, 2024.
- A human-centered safe robot reinforcement learning framework with interactive behaviors. Frontiers in Neurorobotics, 17, 2023.
- Safe multi-agent reinforcement learning for multi-robot control. Artificial Intelligence, 319:103905, 2023.
- A review of safe reinforcement learning: Methods, theory and applications. arXiv preprint arXiv:2205.10330, 2022.
- Nash q-learning for general-sum stochastic games. J. Mach. Learn. Res., 4(null):1039–1069, dec 2003.
- Robust reinforcement learning as a stackelberg game via adaptively-regularized adversarial training, 2022.
- Settling the variance of multi-agent policy gradients. Advances in Neural Information Processing Systems, 34:13458–13470, 2021.
- Trust region policy optimisation in multi-agent reinforcement learning. In ICLR 2022-10th International Conference on Learning Representations, page 1046. The International Conference on Learning Representations (ICLR), 2022.
- Edouard Leurent. An environment for autonomous driving decision-making. https://github.com/eleurent/highway-env, 2018.
- Cmix: Deep multi-agent reinforcement learning with peak and average constraints. In Machine Learning and Knowledge Discovery in Databases. Research Track: European Conference, ECML PKDD 2021, Bilbao, Spain, September 13–17, 2021, Proceedings, Part I 21, pages 157–173. Springer, 2021.
- Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems, 30, 2017.
- Decentralized policy gradient descent ascent for safe multi-agent reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 8767–8775, 2021.
- Shield decentralization for safe multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 35:13367–13379, 2022.
- Learning in bi-level markov games. In 2022 International Joint Conference on Neural Networks (IJCNN), pages 1–8, 2022.
- Gradient methods for solving stackelberg games, 2019.
- The kinematic bicycle model: A consistent model for planning feasible trajectories for autonomous vehicles? In 2017 IEEE Intelligent Vehicles Symposium (IV), pages 812–818, 2017.
- Monotonic value function factorisation for deep multi-agent reinforcement learning. The Journal of Machine Learning Research, 21(1):7234–7284, 2020.
- Trust region policy optimization. In International conference on machine learning, pages 1889–1897. PMLR, 2015.
- Hanif D Sherali. A multiple leader stackelberg model and analysis. Operations Research, 32(2):390–404, 1984.
- Value-decomposition networks for cooperative multi-agent learning based on team reward. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pages 2085–2087, 2018.
- Ming Tan. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the tenth international conference on machine learning, pages 330–337, 1993.
- Reward constrained policy optimization. In International Conference on Learning Representations, 2019.
- William Uther. Markov Decision Processes. Springer US, Boston, MA, 2010.
- Heinrich Von Stackelberg. Market structure and equilibrium. Springer Science & Business Media, 2010.
- The surprising effectiveness of ppo in cooperative multi-agent games. Advances in Neural Information Processing Systems, 35:24611–24624, 2022.
- Bi-level actor-critic for multi-agent coordination. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 7325–7332, 2020.
- Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of reinforcement learning and control, pages 321–384, 2021.
- Spatial-temporal-aware safe multi-agent reinforcement learning of connected autonomous vehicles in challenging scenarios. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 5574–5580. IEEE, 2023.
- Stackelberg actor-critic: Game-theoretic reinforcement learning algorithms. In Proceedings of the AAAI conference on artificial intelligence, volume 36, pages 9217–9224, 2022.
- Zhi Zheng (46 papers)
- Shangding Gu (22 papers)