Solving Multi-Agent Safe Optimal Control with Distributed Epigraph Form MARL (2504.15425v1)

Published 21 Apr 2025 in cs.RO, cs.AI, cs.LG, cs.MA, and math.OC

Abstract: Tasks for multi-robot systems often require the robots to collaborate and complete a team goal while maintaining safety. This problem is usually formalized as a constrained Markov decision process (CMDP), which targets minimizing a global cost and bringing the mean of constraint violation below a user-defined threshold. Inspired by real-world robotic applications, we define safety as zero constraint violation. While many safe multi-agent reinforcement learning (MARL) algorithms have been proposed to solve CMDPs, these algorithms suffer from unstable training in this setting. To tackle this, we use the epigraph form for constrained optimization to improve training stability and prove that the centralized epigraph form problem can be solved in a distributed fashion by each agent. This results in a novel centralized training distributed execution MARL algorithm named Def-MARL. Simulation experiments on 8 different tasks across 2 different simulators show that Def-MARL achieves the best overall performance, satisfies safety constraints, and maintains stable training. Real-world hardware experiments on Crazyflie quadcopters demonstrate the ability of Def-MARL to safely coordinate agents to complete complex collaborative tasks compared to other methods.

Summary

Solving Multi-Agent Safe Optimal Control with Distributed Epigraph Form MARL

In the evolving field of multi-agent systems (MAS), the challenge of attaining safe coordination among agents has garnered increasing attention, especially with the application of reinforcement learning (RL). The research articulated in "Solving Multi-Agent Safe Optimal Control with Distributed Epigraph Form MARL" offers a novel approach to dealing with the complexities of multi-agent safe optimal control. This paper proposes the Distributed Epigraph Form Multi-Agent Reinforcement Learning (Def-MARL), which addresses the inadequacies of existing safe Multi-Agent Reinforcement Learning (MARL) algorithms.

Problem Setup and Motivation

The multi-agent safe optimal control problem (MASOCP) is characterized by the need for agents to collaborate to achieve a common objective while ensuring safety constraints are zero-constrained, meaning any constraint violation is impermissible. Existing MARL algorithms often suffer from unstable training dynamics when handling such stringent safety constraints, particularly when threshold values approach zero. This instability renders them ineffective for real-world applications requiring hard safety constraints.

Methodological Contributions

Def-MARL introduces a distinctive solution by converting the MASOCP into an epigraph form, thereby dividing it into an unconstrained inner problem and a constrained outer problem. The inner problem focuses on minimizing the maximum constraint violation and the cost deficit against a dynamic upper bound (auxiliary variable), addressed during centralized training. The use of epigraph form adapts techniques from single-agent RL to MARL, enhancing training stability thanks to the avoidance of large gradient magnitudes that occur in Lagrangian methods.

Furthermore, Def-MARL incorporates decentralized execution, adhering to the centralized training and decentralized execution (CTDE) paradigm. The theoretical framework proves that the outer problem can be solved by each agent independently, thus enabling distributed execution. This innovative decomposition ensures that agents can compute their respective cost bounds without centralized oversight, fundamentally contributing to the robustness and scalability of the algorithm.

Empirical Evidence

The simulation experiments encompass various environments such as modified Multi-Agent Particle Environments (MPE) and Safe Multi-agent MuJoCo, where Def-MARL exhibits superior performance compared to baseline algorithms. Notably, Def-MARL achieves optimal safety compliance akin to conservative baselines, while simultaneously attaining operational efficiency comparable to aggressive baselines. Moreover, the method demonstrates consistent performance across environments with unchanged hyperparameters, indicating robustness to varying settings, which is critical for real-world applications.

On hardware platforms utilizing Crazyflie quadcopters, Def-MARL successfully performed complex collaborative tasks with high safety compliance and task success rates, surpassing traditional MPC-based methods that lack coordination or exhibit unstable behaviors due to local minima and infeasibility.

Theoretical and Practical Implications

The implications of this research extend into both theory and practice. The theoretical framework paves the way for addressing zero-constrained safety in MARL, creating potential for advancements in the design of robust multi-agent systems. Practically, the work suggests feasible paths for implementing RL-based agent coordination in industries reliant on MAS, such as autonomous drones, vehicular networks, and robotics in dynamic environments.

Future Prospects

Future developments might explore the optimization of communication strategies among agents during distributed execution, addressing potential gaps in theoretical guarantees when communication is limited or infeasible. Additionally, adapting Def-MARL for scenarios involving noise, disturbances, and communication delays presents intriguing challenges.

In conclusion, "Solving Multi-Agent Safe Optimal Control with Distributed Epigraph Form MARL" provides a promising advancement in MARL, emphasizing safety and distributed coordination. Its contributions stand poised to influence both theoretical research directions and practical applications in MAS safety and efficiency, stimulating further exploration in AI-driven collaborative systems.