Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

131 tokens/sec

GPT-4o

10 tokens/sec

Gemini 2.5 Pro Pro

47 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

CORBA: Contagious Recursive Blocking Attacks on Multi-Agent Systems Based on Large Language Models (2502.14529v1)

Published 20 Feb 2025 in cs.CL and cs.AI

Abstract: LLM-based Multi-Agent Systems (LLM-MASs) have demonstrated remarkable real-world capabilities, effectively collaborating to complete complex tasks. While these systems are designed with safety mechanisms, such as rejecting harmful instructions through alignment, their security remains largely unexplored. This gap leaves LLM-MASs vulnerable to targeted disruptions. In this paper, we introduce Contagious Recursive Blocking Attacks (Corba), a novel and simple yet highly effective attack that disrupts interactions between agents within an LLM-MAS. Corba leverages two key properties: its contagious nature allows it to propagate across arbitrary network topologies, while its recursive property enables sustained depletion of computational resources. Notably, these blocking attacks often involve seemingly benign instructions, making them particularly challenging to mitigate using conventional alignment methods. We evaluate Corba on two widely-used LLM-MASs, namely, AutoGen and Camel across various topologies and commercial models. Additionally, we conduct more extensive experiments in open-ended interactive LLM-MASs, demonstrating the effectiveness of Corba in complex topology structures and open-source models. Our code is available at: https://github.com/zhrli324/Corba.

Summary

The paper presents CORBA, a novel attack that leverages contagious and recursive mechanisms to block agents and reduce system availability.
Experimental analyses on frameworks like AutoGen and Camel reveal high Proportional Attack Success Rates and rapid propagation of the attack.
The study highlights that conventional defense measures are ineffective against CORBA, underscoring the need for specialized security strategies in LLM-MASs.

Analysis of CORBA: Contagious Recursive Blocking Attacks on Multi-Agent Systems Based on LLMs

The paper "CORBA: Contagious Recursive Blocking Attacks on Multi-Agent Systems Based on LLMs" (arXiv id: (2502.14529)) introduces and analyzes a novel attack vector, termed Contagious Recursive Blocking Attacks (Corba), targeting LLM-MASs. This attack leverages the interconnected nature of agents within these systems to propagate blocking instructions, leading to resource depletion and reduced system availability. The core innovation lies in the combination of contagious and recursive properties, which amplify the attack's impact and make it resilient to typical alignment-based defenses.

Contagious and Recursive Attack Mechanism

Corba's effectiveness stems from its design to propagate through the LLM-MAS topology. The "contagious" aspect refers to the ability of an infected agent to transmit the malicious prompt to its neighbors, thus spreading the attack across the network. This propagation is not limited to specific topologies, making it applicable to a wide range of LLM-MAS architectures.

The "recursive" property ensures the persistence of the blocking behavior within infected agents. This is achieved through a self-loop mechanism, where the malicious prompt is continuously re-triggered, preventing the agent from recovering or diverging to a non-blocked state. This recursion is critical for sustained resource depletion and maintaining a high Proportional Attack Success Rate (P-ASR).

The authors emphasize that the attack often involves seemingly benign instructions, a key characteristic that allows Corba to evade standard safety mechanisms focused on detecting overtly harmful content. This subtle approach makes it difficult to mitigate using conventional alignment techniques.

Experimental Evaluation and Results

The authors conducted extensive experiments to evaluate Corba's performance on two prominent LLM-MAS frameworks: AutoGen and Camel. These frameworks were chosen for their flexibility in constructing LLM-MASs with varying topologies.

Experiments were performed utilizing a range of commercial LLMs, including GPT-4o-mini, GPT-4, GPT-3.5-turbo, and Gemini-2.0-Flash, to assess the attack's generalizability across different models. Further experiments were carried out on open-ended interactive LLM-MASs, designed to simulate human society, utilizing open-source LLMs such as Qwen2.5-14B, Llama3.1-70B, and Gemma-2-27B.

The primary metrics for evaluating the attack's effectiveness were:

Proportional Attack Success Rate (P-ASR): This metric quantifies the proportion of agents within the LLM-MAS that are successfully blocked by the attack. A high P-ASR indicates a significant reduction in system availability.
Peak Blocking Turn Number (PTN): PTN measures the number of turns required for the attack to reach its maximum impact, i.e., the point where the P-ASR stabilizes. A low PTN signifies a faster and more efficient attack.

The experimental results consistently demonstrated the effectiveness of Corba in reducing the availability of LLM-MASs and wasting computational resources. The attack outperformed a baseline broadcast-based attack method, highlighting the importance of the recursive mechanism. Experiments in open-ended LLM-MASs showed rapid propagation of the attack, effectively "infecting" agents within the system. Moreover, the authors investigated the efficacy of common safety defense methods against Corba and found them to be largely ineffective, underscoring the need for specialized defense strategies.

Implications for LLM-MAS Security

The success of Corba exposes a critical security vulnerability in current LLM-MAS designs. The results highlight the lack of adequate security considerations in these systems, making them susceptible to attacks that can significantly degrade performance and availability.

Key implications include:

Insufficient Security Measures: Existing LLM-MAS designs lack robust security mechanisms to detect and prevent blocking attacks, leaving them vulnerable to exploitation.
Need for Specialized Defenses: The paper emphasizes the urgent need for developing defense mechanisms specifically tailored to detect and mitigate blocking attacks in LLM-MASs. Current defense mechanisms are primarily focused on jailbreaking and harmful content generation, proving ineffective against Corba.
Real-World Threat Potential: The effectiveness of Corba across various topologies and LLMs suggests that it poses a realistic threat to real-world LLM-MAS applications.

Vulnerability to Seemingly Benign Instructions

A crucial aspect of the CORBA attack is its reliance on seemingly benign instructions to trigger the blocking behavior. This characteristic allows the attack to circumvent traditional alignment methods that focus on detecting and filtering overtly harmful content. The subtle nature of the malicious prompts makes them difficult to identify and mitigate using existing safety mechanisms. This highlights a significant gap in current LLM-MAS security, as the systems are vulnerable to attacks that exploit the ambiguity and complexity of natural language.

The authors also explored the efficacy of common safety defense methods against Corba and found them largely ineffective.

Mitigation Strategies

The paper does not focus extensively on mitigation strategies, but the analysis implicitly suggests several potential avenues for defense:

Anomaly Detection: Implementing anomaly detection mechanisms to identify unusual communication patterns or resource consumption spikes within the LLM-MAS.
Input Validation and Sanitization: Developing more sophisticated input validation and sanitization techniques to detect and neutralize malicious prompts, even if they appear benign.
Rate Limiting and Resource Management: Implementing rate limiting and resource management policies to prevent individual agents from monopolizing resources or overwhelming the system with blocking requests.
Trust and Reputation Systems: Incorporating trust and reputation systems to identify and isolate potentially compromised agents.
Decentralized Architectures: Exploring decentralized architectures to limit the impact of individual agent compromises and prevent the spread of attacks.

Conclusion

The CORBA attack represents a significant threat to the security and reliability of LLM-MASs. Its contagious and recursive properties enable widespread disruption and resource depletion, while its reliance on seemingly benign instructions allows it to evade traditional safety mechanisms. The findings of this paper underscore the urgent need for prioritizing security considerations in the design and deployment of LLM-MASs and developing specialized defense strategies to mitigate the risk of blocking attacks.

PDF Markdown

GitHub

GitHub - zhrli324/Corba (1 star)