ACIArena: Toward Unified Evaluation for Agent Cascading Injection

Published 9 Apr 2026 in cs.AI, cs.CL, and cs.CR | (2604.07775v1)

Abstract: Collaboration and information sharing empower Multi-Agent Systems (MAS) but also introduce a critical security risk known as Agent Cascading Injection (ACI). In such attacks, a compromised agent exploits inter-agent trust to propagate malicious instructions, causing cascading failures across the system. However, existing studies consider only limited attack strategies and simplified MAS settings, limiting their generalizability and comprehensive evaluation. To bridge this gap, we introduce ACIArena, a unified framework for evaluating the robustness of MAS. ACIArena offers systematic evaluation suites spanning multiple attack surfaces (i.e., external inputs, agent profiles, inter-agent messages) and attack objectives (i.e., instruction hijacking, task disruption, information exfiltration). Specifically, ACIArena establishes a unified specification that jointly supports MAS construction and attack-defense modules. It covers six widely used MAS implementations and provides a benchmark of 1,356 test cases for systematically evaluating MAS robustness. Our benchmarking results show that evaluating MAS robustness solely through topology is insufficient; robust MAS require deliberate role design and controlled interaction patterns. Moreover, defenses developed in simplified environments often fail to transfer to real-world settings; narrowly scoped defenses may even introduce new vulnerabilities. ACIArena aims to provide a solid foundation for advancing deeper exploration of MAS design principles.

Abstract PDF Upgrade to Chat

Authors (9)

Summary

The paper introduces ACIArena, a unified evaluation framework that systematically quantifies ACI vulnerabilities across diverse MAS topologies and attack surfaces.
It utilizes 1,356 test cases spanning 28 derived attacks to assess benign utility, attack success rate, and propagation vulnerability in realistic scenarios.
Empirical findings reveal that MAS robustness critically depends on agent roles, topology, and tailored defenses, exposing limitations of conventional strategies.

ACIArena: A Unified Benchmark for Agent Cascading Injection Evaluation in LLM-Based Multi-Agent Systems

Introduction

The proliferation of LLM-based Multi-Agent Systems (MAS) has led to strong advancements in complex collaborative tasks such as automated code generation, mathematical reasoning, and large-scale workflow orchestration. While such systems demonstrate superior effectiveness through iterative inter-agent information sharing, they are exposed to increasingly sophisticated security threats. The Agent Cascading Injection (ACI) threat model describes attacks where one compromised agent exploits trust relationships and communication patterns to initiate a cascade of malicious behavior—resulting in system-wide failures, data exfiltration, or systemic task hijacking. Existing evaluations of such attacks often utilize inconsistent or overly simplified settings, precluding robust or transferable defense development. The ACIArena framework (2604.07775) seeks to fill this gap by providing a systematic, extensible, and standardized evaluation suite for MAS ACI security.

Framework and Scope of ACIArena

ACIArena is architected to support comprehensive analysis of MAS vulnerabilities to ACI attacks across realistic threat scenarios, attack surfaces, and adversarial objectives. Its modular design offers three principal dimensions of coverage:

Attack Surfaces: External inputs, agent profile manipulation, and inter-agent message poisoning.
Attack Objectives: Instruction hijacking, task disruption, and information exfiltration.
MAS Implementations: Six widely deployed or studied MAS architectures are included, reflecting vertical, horizontal, and hierarchical topologies.
Figure 1: Framework overview illustrating attack entry points, propagation pathways, and the dynamic nature of ACI exploits within MAS.

The benchmark encompasses 28 derived ACI attacks, yielding 1,356 systematic test cases. Each case pairs benign multi-step, verifiable tasks in targeted domains (math, programming, science, and medical) with a concrete malicious intervention.

Figure 2: Statistical distribution of attacks, objectives, and tested MAS instances in the ACIArena benchmark.

This design facilitates reproducible, apples-to-apples evaluations—enabling clear attributions of observed vulnerabilities to system design, agent configuration, or specific defensive interventions.

Threat Model and Attack Taxonomy

Attackers in ACIArena are modeled with no white-box access—mirroring practical situations where the internal model and agent code are inaccessible. Attack surface taxonomy is operationalized as follows:

Adversarial Input: Direct prompt or memory manipulation at input.
Malicious Agent (Profile Injection): Payloads modify agent configuration to alter downstream behavior.
Message Poison: Adversary forges or modifies inter-agent communications at the edge level.

Objectives probe distinct real-world ACI risks: inducing deviation from user intent (hijacking), degrading collaborative success rates (disruption), and extracting confidential content (exfiltration). Attack generation is automated using a mutation- and selection-based process with LLM judges optimizing for both stealth and harmfulness.

Evaluation Protocols and Metrics

Benchmark results are measured according to:

Benign Utility (BU): Pass@1 or task success rate without attack.
Attack Success Rate (ASR): Fraction of cases achieving adversary objectives (automatically or LLM-judge assessed).
Utility under Attack (UA): Retained utility in the presence of attack.
Propagation Vulnerability Index (PVI): Weighted propagation analysis considering the topological distance from compromised agents to system output, modulated by ASR.

These metrics quantify both direct success of individual exploits and the secondary risk of system-wide contamination.

Empirical Findings: MAS Robustness and Vulnerability Analysis

Sensitivity to Role Design and Topology

Analysis bridges prior work focusing only on topology—demonstrating that identical inter-agent communication graphs can yield widely divergent ASR results depending on agent role assignments. For example, a Corba-style recursive attack achieves radically different ASRs as profile combinations vary.

Figure 3: ASR variation of Corba attack with fixed topology but differing GPT-4o agent profiles.

This finding confirms that meaningful security robustness evaluations require systematic control of both topology and social (role-based) MAS design.

System-Level Vulnerabilities

All tested MAS architectures demonstrated substantial susceptibility to ACI; even with minimal adversarial presence (one compromised agent), attacks frequently propagate and succeed.

Figure 4: Agent-level ASR and PVI aggregation across seven MAS, emphasizing which agent roles yield maximal propagation risk.

Notably, systems with explicit critic or verifier roles (e.g., AgentVerse, CAMEL) provide some attenuation by bottlenecking high-risk propagation. However, increased role complexity or denser connectivity may actually amplify ASR unless paired with unidirectional restrictions.

Domain-Specific and Model-Dependent Risk

MAS in code generation domains are particularly vulnerable: in some configurations, hijacking and disruption ASR approaches 100%, with benign utility sharply reduced.

Model capability paradoxically exacerbates some vulnerabilities. Stronger LLMs (e.g., GPT-4o) display higher ASR in several attack suites compared to smaller models (e.g., Qwen2.5).

Figure 5: Average ASR by model: higher-scale LLMs often lead to increased attack success across suites.

Attack Surface Efficacy

Precision attacks (hijacking, exfiltration) generally succeed most often through adversarial inputs, whereas complex message-based or profile-based assaults are highly effective for disruption tasks.

Figure 6: Average ASR stratified by attack surface, revealing clear differential in surface effectiveness dependent on attack objectives.

Practical Defense Evaluation

Canonical defenses, including token-level classifiers and prompt delimiters, are found to offer limited ACI robustness—often failing to generalize to agent-to-agent cascades. Profile-based defenses such as AGrail or topological solutions like G-Safeguard can achieve partial mitigation but incur utility loss or excessive computational overhead.

A new defense, ACI-Sentinel, applies a "semantic least privilege" filter to retain only contextually essential information at each agent step. This method substantially reduces ASR in several MAS (e.g., exfiltration ASR drops by 53.33% in AutoGen), but complete elimination of attack success and cost-free performance remains unachieved. Residual vulnerabilities persist, especially as attacks adapt to defeat message-only or semantic filtering.

Implications, Limitations, and Future Research Trajectories

ACIArena demonstrates that prevailing assumptions around MAS robustness, model scaling, and defense generalizability are insufficient when exposed to rigorous, systematic ACI benchmarking.

Key practical implications are:

Defensive evaluation in MAS must systematically enumerate diverse attack surfaces and objectives, ensuring negative results under restricted settings are not overgeneralized.
MAS security design must co-optimize roles, topology, and propagation-aware filtering to achieve meaningful decreases in system-level ASR.
Single-agent defense strategies are inadequate for MAS, as inter-agent trust dynamics and cascading behaviors invalidate their assumptions.

This work points to required research in robustness-oriented system design, dynamic agent reconfiguration protocols that reduce attack surface overlap, and holistic evaluation pipelines that integrate adversarial, utility, and propagation metrics for development and deployment cycles.

Conclusion

ACIArena provides a unified, extensible platform for benchmarking ACI robustness in LLM-based MAS, addressing both technical debt in evaluation practice and deepening theoretical understanding of MAS security. MAS designers, researchers, and deployers must leverage such systematic evaluations as a prerequisite for any claims of real-world reliability and trustworthiness.

Reference: "ACIArena: Toward Unified Evaluation for Agent Cascading Injection" (2604.07775)

Markdown Report Issue