AgentSafe: Safeguarding Large Language Model-based Multi-agent Systems via Hierarchical Data Management (2503.04392v1)

Published 6 Mar 2025 in cs.AI

Abstract: LLM based multi-agent systems are revolutionizing autonomous communication and collaboration, yet they remain vulnerable to security threats like unauthorized access and data breaches. To address this, we introduce AgentSafe, a novel framework that enhances MAS security through hierarchical information management and memory protection. AgentSafe classifies information by security levels, restricting sensitive data access to authorized agents. AgentSafe incorporates two components: ThreatSieve, which secures communication by verifying information authority and preventing impersonation, and HierarCache, an adaptive memory management system that defends against unauthorized access and malicious poisoning, representing the first systematic defense for agent memory. Experiments across various LLMs show that AgentSafe significantly boosts system resilience, achieving defense success rates above 80% under adversarial conditions. Additionally, AgentSafe demonstrates scalability, maintaining robust performance as agent numbers and information complexity grow. Results underscore effectiveness of AgentSafe in securing MAS and its potential for real-world application.

Summary

The paper introduces AgentSafe, a security framework safeguarding LLM-based multi-agent systems via hierarchical data management against unauthorized access and data breaches.
AgentSafe features ThreatSieve for securing inter-agent communication via authentication and permission control, and HierarCache for adaptive, hierarchical memory management and attack defense.
Experiments show AgentSafe effectively defends against topology-based and memory-based attacks, demonstrating over 80% success rates and maintaining high information integrity with good scalability.

AgentSafe Framework: A Deep Dive

AgentSafe is a security framework tailored for LLM-based MAS, mitigating vulnerabilities related to unauthorized access and data breaches through hierarchical information management and memory protection. It enforces a security level classification for information, restricting sensitive data access to authorized agents, thereby enabling controllable, traceable, and manageable information flow within the MAS. The framework incorporates two key components: ThreatSieve and HierarCache.

ThreatSieve: Securing Inter-Agent Communication

ThreatSieve is designed to secure communication channels between agents by focusing on authentication, permission validation, and controlled information flow. Its primary functionalities include:

Authentication and Permission Validation: ThreatSieve authenticates the sender agent to prevent identity impersonation attacks. It uses API calls and an LLM to extract and validate identity information embedded within the messages. This mechanism ensures that only verified agents participate in the communication process.
Permission Control: It regulates communication based on the permission levels (security rankings) of the agents involved. Communication is only permitted if the sender's permission level is equal to or greater than the receiver's permission level. This component directs communications to the appropriate sub-memory ranking within the receiving agent's memory, based on the security ranking of the communication.

HierarCache: Adaptive and Hierarchical Memory Management

HierarCache provides adaptive memory management and defends against unauthorized access and malicious poisoning of agent memory. This component is structured as a hierarchical database with relationship-based access permissions. Key features of HierarCache include:

Hierarchical Storage: HierarCache organizes agent memory into multiple layers ("drawers"), each corresponding to a specific security level. This segmentation ensures sensitive data isolation, making it accessible only to agents with appropriate authority. The system adaptively stores historical information based on agent relationships, optimizing memory usage and access efficiency.
"Junk Memory" Mechanism: HierarCache incorporates a "Junk Memory" mechanism to mitigate memory-targeted attacks that flood agent memory with redundant or irrelevant information (akin to DDoS attacks). This mechanism evaluates potentially irrelevant information using an instruction-based approach, leveraging hierarchical agent-information relationships and instruction-level comparisons to filter and store such data as "junk," ensuring efficient memory utilization.
Periodic Detection and Isolation: A periodic detection mechanism inspects and isolates false information. A LLM is used for reflection on the information, and an instruction library aids in identifying false information. Identified false information is then moved to the junk memory.

Hierarchical Data Management: Principles and Implementation

AgentSafe's hierarchical data management approach operates on several core principles:

Security Level Classification: Information is categorized based on sensitivity or importance, with each category assigned a specific security level.
Access Control: Access to information is restricted based on the security level of the information and the permission level of the agent requesting access. Only agents with a permission level equal to or higher than the information's security level are granted access.
Information Flow Control: AgentSafe controls the flow of information between agents, ensuring that sensitive data is shared only among authorized agents.
Memory Segmentation: Agent memory is segmented into hierarchical "drawers" based on security levels, providing a structured and organized method for storing and managing information.

Performance Evaluation Under Adversarial Conditions

The paper presents comprehensive experimental results demonstrating AgentSafe's effectiveness in securing multi-agent systems against adversarial attacks, specifically topology-based attacks (TBAs) and memory-based attacks (MBAs).

Topology-Based Attacks (TBAs): These attacks exploit agent relationships and authorization hierarchies to gain unauthorized access to sensitive information. AgentSafe demonstrated a high defense success rate, exceeding 80% in TBA scenarios, significantly outperforming a baseline system without AgentSafe. Specifically, AgentSafe achieved an 85.93% defense success rate at turn 5, compared to 50.32% for the baseline, and maintained 82.50% at turn 50.
Memory-Based Attacks (MBAs): These attacks manipulate stored data through misinformation or identity deception, potentially leading to data leakage, malicious poisoning, or system degradation. AgentSafe demonstrated superior information integrity preservation in MBA scenarios. The Cosine Similarity Rate (CSR) remained higher for AgentSafe, indicating better preservation of information integrity, with CSR staying above 0.65 after 10 rounds, while the baseline dropped below 0.4.
Scalability: AgentSafe maintains robust performance as the number of agents and information complexity increase, with CSR between 0.68 and 0.85. This highlights the framework's scalability and suitability for real-world applications. Furthermore, the system also demonstrated a reduction in token consumption.

Conclusion

In summary, AgentSafe is a security framework designed to enhance the resilience of LLM-based MAS. By implementing hierarchical data management, ThreatSieve, and HierarCache, AgentSafe significantly mitigates the risks associated with unauthorized access and data breaches. The experimental results validate its effectiveness in defending against topology-based and memory-based attacks, showcasing its potential for practical application in securing multi-agent systems.

PDF Markdown

Tweets

https://twitter.com/rohanpaul_ai/status/1898836136921059328