SentinelNet: Safeguarding Multi-Agent Collaboration Through Credit-Based Dynamic Threat Detection (2510.16219v2)

Published 17 Oct 2025 in cs.CR and cs.AI

Abstract: Malicious agents pose significant threats to the reliability and decision-making capabilities of Multi-Agent Systems (MAS) powered by LLMs. Existing defenses often fall short due to reactive designs or centralized architectures which may introduce single points of failure. To address these challenges, we propose SentinelNet, the first decentralized framework for proactively detecting and mitigating malicious behaviors in multi-agent collaboration. SentinelNet equips each agent with a credit-based detector trained via contrastive learning on augmented adversarial debate trajectories, enabling autonomous evaluation of message credibility and dynamic neighbor ranking via bottom-k elimination to suppress malicious communications. To overcome the scarcity of attack data, it generates adversarial trajectories simulating diverse threats, ensuring robust training. Experiments on MAS benchmarks show SentinelNet achieves near-perfect detection of malicious agents, close to 100% within two debate rounds, and recovers 95% of system accuracy from compromised baselines. By exhibiting strong generalizability across domains and attack patterns, SentinelNet establishes a novel paradigm for safeguarding collaborative MAS.

Summary

The paper introduces a decentralized framework that transforms agents into sentinels for proactive threat detection in multi-agent systems.
It employs adversarial trajectory generation and contrastive learning to dynamically rank agents and isolate malicious behaviors.
Experimental results demonstrate near 100% detection accuracy, restoring up to 95% system performance, with implications for scalable MAS security.

"SentinelNet: Safeguarding Multi-Agent Collaboration Through Credit-Based Dynamic Threat Detection"

Introduction

The paper "SentinelNet: Safeguarding Multi-Agent Collaboration Through Credit-Based Dynamic Threat Detection" introduces a novel decentralized framework designed to enhance the security and reliability of Multi-Agent Systems (MAS) that are increasingly powered by LLMs. The authors propose SentinelNet as a proactive, decentralized solution to detect and mitigate the influence of malicious agents, addressing significant limitations found in existing centralized and reactive defenses. Centralized architectures, being susceptible to single points of failure and scalability issues, necessitate a new approach like SentinelNet which distributes security responsibilities across agents themselves.

Overview of SentinelNet

SentinelNet is structured around transforming agents into sentinel nodes that can autonomously identify and suppress malicious behaviors. This is achieved through three primary stages:

Adversarial Trajectory Generation: The system generates synthetic attack scenarios to simulate diverse threats, addressing the challenge of attack data scarcity.
Contrastive Learning-Based Training: This training improves the agents' ability to evaluate message credibility, leveraging a contrastive learning approach that reinforces distinctions between constructive and adversarial agent behaviors.
Dynamic Ranking with Bottom-k Elimination: This mechanism continuously ranks agents based on a credit-scoring model, dynamically isolating those producing malicious outputs, thereby enhancing the system's resilience against attacks.
Figure 1: Overview of the SentinelNet framework, which transforms agents into sentinel nodes for proactive threat detection through three stages: adversarial trajectory generation, contrastive learning-based training, and dynamic ranking with bottom-k elimination.

Experimental Evaluation

Experiments were conducted on several MAS benchmarks, including tasks like factual reasoning and multi-agent debate simulations. SentinelNet demonstrated a nearly flawless detection rate of malicious agents, with detection accuracies nearing 100% within two debate rounds. The system successfully restored up to 95% of the MAS accuracy that had been compromised by adversarial influences.

Figure 2: Comparison of SentinelNet with the baselines across six multi-agent debate benchmarks, where SentinelNet consistently outperforms in terms of Detection Accuracy, False Positive Rate (FPR), and False Negative Rate (FNR).

The experimental setup involved various datasets and adversarial attack scenarios, ensuring a comprehensive evaluation of SentinelNet's defense capabilities. The results underscored SentinelNet's superior performance over existing baselines like G-SafeGuard and AgentSafe, which are prone to higher error rates and limited adaptation to diverse attack strategies.

Implications and Future Directions

SentinelNet establishes a new paradigm for MAS security by decentralizing the defense mechanism, thereby enhancing both scalability and robustness. The framework's ability to autonomously detect and filter adversarial inputs enables its application in critical domains, such as healthcare and finance, where decision accuracy is paramount.

The framework's reliance on multi-agent debate trajectories crafted through simulation allows for robust training despite the inherent difficulty in obtaining real-world adversarial data. This innovative approach to threat simulation could inform future MAS security protocols by providing a scalable template for agent-centric threat detection.

Future research could explore the integration of human feedback and adaptive learning to further refine the detection capabilities of SentinelNet. The ongoing development of more sophisticated adversarial techniques will necessitate continuous refinement of the detection algorithms and the exploration of new machine learning paradigms to keep pace with emerging threats.

Conclusion

The paper presents SentinelNet as a comprehensive, decentralized, and proactive framework for addressing the challenges of security in MAS environments. By focusing on agent autonomy and distributed detection processes, SentinelNet not only mitigates existing vulnerabilities but also sets the stage for further advancements in secure multi-agent collaborations. Its successful application across diverse datasets highlights the framework's potential for broad adaptability and impact across various domains reliant on MAS technologies.