Red-Teaming LLM Multi-Agent Systems via Communication Attacks (2502.14847v2)
Abstract: LLM-based Multi-Agent Systems (LLM-MAS) have revolutionized complex problem-solving capability by enabling sophisticated agent collaboration through message-based communications. While the communication framework is crucial for agent coordination, it also introduces a critical yet unexplored security vulnerability. In this work, we introduce Agent-in-the-Middle (AiTM), a novel attack that exploits the fundamental communication mechanisms in LLM-MAS by intercepting and manipulating inter-agent messages. Unlike existing attacks that compromise individual agents, AiTM demonstrates how an adversary can compromise entire multi-agent systems by only manipulating the messages passing between agents. To enable the attack under the challenges of limited control and role-restricted communication format, we develop an LLM-powered adversarial agent with a reflection mechanism that generates contextually-aware malicious instructions. Our comprehensive evaluation across various frameworks, communication structures, and real-world applications demonstrates that LLM-MAS is vulnerable to communication-based attacks, highlighting the need for robust security measures in multi-agent systems.
Summary
- This paper presents the Agent-in-the-Middle (AiTM) attack, showing that manipulating communication between agents can compromise LLM Multi-Agent Systems.
- The AiTM attack uses an LLM-powered agent with reflection to analyze context and manipulate messages convincingly for stealthy system compromise.
- The attack shows high effectiveness across various frameworks and applications, revealing a significant vulnerability in LLM-MAS communication.
LLM-based Multi-Agent Systems (LLM-MAS) leverage communication between individual LLM agents to collaboratively address complex tasks. While this communication is fundamental to their operation, it concurrently introduces significant security vulnerabilities that have remained relatively unexplored. The work "Red-Teaming LLM Multi-Agent Systems via Communication Attacks" (2502.14847) introduces a specific attack vector, termed Agent-in-the-Middle (AiTM), targeting these inter-agent communication channels. This approach contrasts with prior methods focusing on compromising individual agent prompts or parameters, instead demonstrating systemic compromise via message manipulation alone.
Agent-in-the-Middle (AiTM) Attack Mechanism
The core premise of the AiTM attack is the interception and manipulation of messages exchanged between agents within an LLM-MAS. An adversary, acting as a "man-in-the-middle," intercepts a message intended for a recipient agent, modifies its content to serve malicious objectives, and forwards the altered message. This exploits the inherent trust agents place in the messages received through the system's communication infrastructure.
Key characteristics of the AiTM attack include:
- Targeting Communication: Unlike jailbreaking, prompt injection, or model poisoning which target individual agents or their underlying models, AiTM focuses exclusively on the communication link. This allows bypassing defenses aimed at securing individual agent prompts or behaviors.
- Systemic Impact: By manipulating the flow of information and instructions, AiTM can potentially compromise the entire system's behavior, leading to incorrect task outcomes, resource waste, information leakage, or complete task hijacking, even if individual agents remain uncompromised in their internal state or LLM parameters.
- Exploiting Implicit Trust: LLM-MAS often operate under the assumption that received messages are authentic and originate from the claimed sender agent. AiTM directly subverts this implicit trust.
Implementing AiTM faces challenges inherent to MAS structures:
- Limited Control: The attacker may only control the communication channel, without direct access to agent internals or the overarching orchestration logic.
- Role-Restricted Formats: Communication protocols within MAS often impose strict formatting and content constraints based on predefined agent roles and interaction patterns. Malicious messages must adhere to these formats to avoid immediate detection.
Adversarial Agent Design and Implementation
To overcome the challenges of contextual relevance and adherence to communication formats, the paper proposes using an LLM-powered adversarial agent to perform the message manipulation. This adversarial agent is designed to understand the conversational context and generate effective, stealthy manipulations.
A core component of this adversarial agent is a reflection mechanism. This mechanism enables the agent to iteratively refine the malicious message content. The process typically involves:
- Contextual Analysis: Upon intercepting a message, the adversarial agent analyzes the conversation history, the roles of the sender and receiver, the overall goal of the MAS, and the specific content of the intercepted message (morig).
- Malicious Goal Integration: The agent incorporates the adversary's objective (e.g., inject false information, change task instructions, escalate privileges).
- Candidate Generation: Based on the context and goal, the LLM generates one or more candidate manipulated messages (mmal).
- Reflection and Refinement: The agent critically evaluates the generated candidates against several criteria:
- Effectiveness: Does mmal likely achieve the malicious goal?
- Stealthiness: Does mmal appear plausible within the conversational context and adhere to expected formats? Will it raise suspicion?
- Consistency: Is mmal consistent with the perceived persona and role of the original sender?
- Selection/Iteration: The agent selects the best candidate or iterates the generation and refinement process until a satisfactory malicious message is produced.
This reflection process can be implemented using carefully designed prompts that guide the adversarial LLM through the analysis, generation, and evaluation steps. It mirrors cognitive processes of planning and revision, allowing the LLM to generate sophisticated and contextually appropriate manipulations.
The architecture can be conceptualized as:
1 2 3 4 5 6 7 8 |
Intercepted Message (m_orig) -> Adversarial LLM Agent { 1. Analyze Context (History, Roles, Goals) 2. Generate Initial Manipulation (m_mal_candidate_1) based on Adversarial Goal 3. Reflect & Evaluate (Effectiveness, Stealth, Consistency) 4. If unsatisfactory: Refine Prompt -> Generate New Manipulation (m_mal_candidate_n) -> Go to 3 5. Select Final Manipulation (m_mal_final) } -> Forward Manipulated Message (m_mal_final) |
Attack Implementation Strategies and Execution
Practical implementation of AiTM requires establishing an interception point within the MAS communication infrastructure. This could involve:
- Modifying Framework Code: Directly altering the message passing functions (e.g.,
send
,receive
) within the MAS framework (like AutoGen or AgentVerse). - Network Interception: If agents communicate over a network, standard Man-in-the-Middle techniques (e.g., ARP spoofing, DNS hijacking, proxy manipulation) can be used.
- Compromised Communication Bus: If the MAS relies on a central message broker or bus, compromising this component allows message interception and manipulation.
Once interception is established, the adversarial LLM agent is invoked. The paper explores various attack strategies achievable via message manipulation:
- Role Promotion/Impersonation: Modifying messages to grant the attacker's agent or another agent unauthorized capabilities or decision-making power.
- Task Hijacking: Altering instructions or goals conveyed in messages to divert the MAS towards the attacker's objectives.
- Information Manipulation: Injecting false information, omitting critical details, or subtly biasing the content of messages to mislead agents and corrupt the final output.
- Denial of Service: Corrupting messages to cause communication failures or agent errors, halting progress.
The execution involves deploying the interception mechanism and the adversarial agent, defining the malicious objective (encoded into the agent's instructions/prompts), and allowing the MAS to run. The attack's success depends on the adversarial agent's ability to generate believable manipulations that influence the MAS behavior as intended.
Evaluation Across Frameworks and Applications
The effectiveness of AiTM was evaluated across diverse settings:
- Frameworks: AutoGen, AgentVerse, demonstrating applicability to different MAS architectures.
- Communication Structures: Hierarchical (e.g., manager-worker) and broadcast (e.g., group chat) structures were tested.
- Applications:
- Software Development: Manipulating requirements or code reviews.
- Scientific Discovery: Injecting false data or hypotheses.
- Question Answering: Altering retrieved information or intermediate reasoning steps.
- Multi-Agent Bargaining: Manipulating offers or preferences.
The results reportedly demonstrate significant vulnerability. The AiTM attack achieved high success rates in manipulating agent behavior and degrading task performance across these varied scenarios. Specific quantitative results from the paper (which would need to be consulted for exact figures) likely quantify the percentage of successful task hijacks, the degree of output corruption (e.g., error rate increase), or the success rate of specific manipulation tactics like role promotion under different conditions. The paper highlights that vulnerability is not confined to specific frameworks or communication patterns but is a more general issue stemming from the reliance on message-based coordination.
Practical Implications for Red-Teaming LLM-MAS
The AiTM methodology provides a concrete framework for red-teaming LLM-MAS, moving beyond individual agent security to assess systemic vulnerabilities in communication. A red-teaming exercise using AiTM would involve:
- System Reconnaissance: Identify the MAS framework, communication structure, agent roles, message formats, and communication protocols. Map the flow of information.
- Threat Modeling: Determine potential malicious objectives relevant to the target MAS (e.g., data exfiltration, output manipulation, denial of service).
- Interception Point Setup: Implement a mechanism to intercept messages based on the system's architecture (e.g., modify framework code for white-box tests, network interception for black-box tests).
- Adversarial Agent Deployment: Configure and deploy an LLM-powered adversarial agent capable of context analysis, goal-driven manipulation, and reflection, tailored to the specific message formats and interaction styles.
- Attack Execution: Run the MAS on relevant tasks while the AiTM attack is active, monitoring agent interactions and system outputs.
- Vulnerability Analysis: Analyze the results to determine if the malicious objectives were achieved, assess the impact, and identify specific weaknesses in the communication protocol or agent interaction design.
- Reporting and Mitigation: Document the vulnerabilities and suggest countermeasures.
This process allows security practitioners to proactively identify and address weaknesses in how multi-agent systems coordinate and share information before deployment in critical applications. Required resources include expertise in LLM prompting, MAS frameworks, and potentially network security, along with computational resources for running the adversarial LLM agent.
Potential Mitigation Strategies
While the paper focuses on demonstrating the vulnerability, the findings implicitly suggest potential mitigation directions:
- Message Authentication and Integrity: Implementing cryptographic signatures (e.g., HMAC) to ensure messages originate from the claimed sender and have not been tampered with.
- Secure Communication Channels: Using end-to-end encryption (E2EE) for inter-agent communication, though this might not prevent manipulation if the endpoint initiating the communication is compromised or if an intermediary is trusted.
- Communication Monitoring and Anomaly Detection: Analyzing communication patterns, message content, and agent interactions for deviations from expected behavior that might indicate manipulation.
- Formal Verification of Communication Protocols: Rigorously defining and verifying the security properties of the communication protocols used.
- Agent-Level Verification: Agents could potentially cross-verify critical information received through redundant channels or query trusted external sources, though this increases complexity.
Conclusion
The introduction of the Agent-in-the-Middle (AiTM) attack (2502.14847) highlights a critical vulnerability surface in LLM-based Multi-Agent Systems: the inter-agent communication layer. By employing an LLM-powered adversary with a reflection mechanism, attackers can intercept and manipulate messages to achieve systemic compromise without directly attacking individual agents. The demonstrated effectiveness across various frameworks, structures, and applications underscores the necessity for developing and implementing robust security measures specifically addressing communication integrity and authenticity within LLM-MAS as these systems become more prevalent. This research provides a valuable methodology for red-teaming efforts aimed at securing complex collaborative AI systems.