- The paper shows that control-flow hijacking in multi-agent systems results in arbitrary malicious code execution with success rates ranging from 45% to 64%.
- It evaluates three frameworks—AutoGen, Crew AI, and MetaGPT—and reveals that traditional prompt injection defenses often fail to prevent these hijacking attacks.
- It emphasizes the need for robust trust and isolation mechanisms to safeguard MAS from metadata-based attacks and unauthorized control flow alterations.
Multi-Agent Systems Execute Arbitrary Malicious Code
This essay examines the vulnerabilities within multi-agent systems (MAS) concerning malicious code execution. The susceptibility of these systems to control-flow hijacking attacks raises vital security concerns, prompting careful scrutiny and the development of defense mechanisms.
Introduction
Multi-agent systems represent an innovative approach to leveraging autonomous AI agents, primarily driven by LLMs, to perform complex tasks. Such systems typically interact with a plethora of untrusted inputs, making them liable to attack vectors. Control-flow hijacking attacks exploit this interaction, allowing adversarial content to manipulate communications and invoke unsafe behaviors within the system.
Multi-agent systems with LLM-based architectures often coordinate agents to operate in environments enriched with both trusted and untrusted elements. Untrusted inputs, such as malicious web content and files, offer adversaries a surface to compromise the control flows, potentially culminating in arbitrary code execution and data exfiltration. Figure 1 exemplifies such control-flow hijacking attacks, highlighting differences between benign and malicious execution.
Figure 1: Example of a control-flow hijacking attack, showing control flows in a benign execution and a hijacked execution.
Attack Distinctions
MAS hijacking presents a new threat model distinct from prompt injection attacks. It specifically targets the orchestration layer and employs manipulation of metadata for system coordination rather than affecting individual agent outputs. Thus, these attacks have demonstrated notable effectiveness even when individual agents are not susceptible to prompt injections.
The attack vectors manipulate the communication protocols and orchestration, allowing adversaries to execute harmful code remotely. A MAS setup typically involves pathways where shared metadata guides agent interactions and decisions. This makes it easier for adversaries to reroute actions based on manipulated metadata within the system.
Experimental Setup
Three robust multi-agent frameworks—AutoGen, Crew AI, and MetaGPT—were evaluated to understand their vulnerability spectrum. Each framework demonstrated significant susceptibility to control-flow hijacking attacks, evidenced by successful execution rates ranging from 45% to 64%. These results raise critical security implications, given that traditional prompt injection attacks within similar setups showed near-zero success rates.
Experiments highlight the inadequacies in current safeguarding mechanisms against MAS hijacking, emphasizing the need for comprehensive strategies targeting metadata manipulation and orchestration controls.
Results
Research findings indicate that cross-modal capabilities of MAS can be exploited, employing diverse input types, such as files, webpages, and even multimedia content like videos and images. This multi-modal perspective underscores the scope of potential attacks as MAS evolve. Figure 2 illustrates a file-based MAS hijacking attack on a centrally orchestrated system.
Figure 2: Example of a file-based MAS hijacking attack on a centrally orchestrated MAS.
Also examined were benign task success rates, contrasting them with attack success rates. The performance metrics reveal the ease with which MAS hijacking can imitate permissible activities, thereby executing harmful code with substantial success rates.
Discussion
The vulnerability of multi-agent systems due to the blurred lines between trusted and untrusted content is concerning. Unlike web browsers, MAS lack sophisticated isolation mechanisms, making them prone to metadata-driven attacks. As MAS grow in capabilities and agentic intelligence, addressing these vulnerabilities is crucial for safeguarding user data and system integrity.
MAS hijacking attacks occur due to systemic design flaws where agents blindly trust outputs from other agents, failing to verify the integrity of metadata transmitted. To mitigate these risks, developing robust trust frameworks or policies to manage agent interactions is recommended. Until then, deploying MAS in unprotected environments poses a high-risk scenario.
Figure 3: A multi-modal MAS hijacking attack utilizing webpages and audio content of videos. Initially, in steps 1a and 1b, the MAS interacts with a malicious website or video, which prompts it to download a malicious key.txt file in step 2. Finally, in step 3, the MAS attempts to open the downloaded file and executes it instead, similar to the local-file attack in Figure~\ref{fig:mas_hijacking}.
Conclusion
Multi-agent systems herald a transformative paradigm in AI, but their current susceptibility to control-flow hijacking attacks poses significant security threats. Future research should focus on enhancing MAS safety features, particularly distinguishing between trusted and untrusted content and refining agent communication protocols. Until then, caution should be exercised when deploying MAS in environments requiring high security and data protection.