- The paper proposes a dynamic defense paradigm that reconstructs MAS as directed acyclic graphs and uses backward propagation to detect malicious communication edges.
- It employs a novel contribution extraction method using signed network evaluation to reliably assign scores to each communication edge.
- Experimental results demonstrated a 93% detection success rate, outperforming defenses such as G-Safeguard and AgentXposed.
Monitoring LLM-based Multi-Agent Systems Against Corruptions via Node Evaluation
Introduction
LLMs have increasingly been integrated into complex Multi-Agent Systems (MAS), serving as the core to facilitate communication among agents. This integration, although enhancing capabilities, raises trustworthiness concerns due to potential corruption attacks. Unlike single-agent systems, MAS are vulnerable to complex communication processes that can be exploited through dynamic and evolving attacks. This paper proposes a dynamic defense paradigm that continuously monitors MAS communication graphs, disrupts malicious communications, and dynamically adjusts the graph topology to defend against these attacks effectively.
Figure 1: An overview of our method. In step 1, we reconstruct the MAS as a directed acyclic graph (DAG). In steps 2 and 3, we extract the contribution of each agent to the final decision using the contribution score on each edge and backward propagation from the final decision. This helps determine the latent malicious agents. We then repair the MAS by removing information sent from the detected malicious agents in step 4. The dashed line indicates that the communication edge has been deleted.
Methodology
The proposed method treats the MAS as a directed acyclic graph (DAG), where nodes represent agents and directed edges represent communications between them. This model allows for a comprehensive analysis of communication dynamics and enables the detection of malicious agents through a novel backpropagation technique.
MAS Graph Model: The MAS is modeled as a DAG to cater to computational convenience, with nodes representing agents at different time steps and edges representing directed communications between them.
Contribution Extraction: A signed network is utilized to evaluate contributions on each communication edge. The sign of an edge indicates the nature of its contribution—positive, negative, or neutral. This evaluation leverages an independent LLM to maintain consistency and reliability.
Backward Propagation for Detection: The contribution of each agent is computed using backward propagation across the signed network. This process identifies extreme deviation in contribution scores, indicative of malicious intent, allowing for the dynamic adjustment of the graph by removing harmful communications.
Experimental Results
Extensive experiments across various MAS configurations and datasets demonstrated the superiority of the proposed method over existing defense mechanisms. In experiments on complex tasks using the MMLU dataset, the method achieved an average detection success rate of 93%, significantly outperforming baselines such as G-Safeguard and AgentXposed. Under different attack strategies, the method maintained a robust defense performance, with accuracy dropping minimally compared to unprotected systems.
Implications and Future Work
This research contributes a dynamic and effective defense mechanism for MAS, addressing evolving security threats through a fine-grained node evaluation technique. The implications extend to enhancing the resilience of collaborative AI systems against corruption. Future research could explore adaptive threshold mechanisms and real-time application in expansive MAS environments, further solidifying the method's practical viability. The insights gained could inform the development of more advanced protection strategies that incorporate real-time detection and dynamic graph adaptations for enhanced security in AI systems.
Conclusion
The paper presents a robust method for safeguarding LLM-based MAS by dynamically evaluating node contributions to detect and neutralize malicious agents. The methodology redefines MAS graph defenses, offering an adaptive and proactive solution against increasingly sophisticated adversarial strategies. Through comprehensive evaluations, the proposed approach not only outperformed current defenses but also demonstrated significant potential for broad applications in real-world scenarios. The integration of this method paves the way for secure and trustworthy applications of MAS, crucial for the advancement of collaborative AI technologies.