Here Comes The AI Worm: Unleashing Zero-click Worms that Target GenAI-Powered Applications (2403.02817v2)

Published 5 Mar 2024 in cs.CR

Abstract: In this paper, we show that when the communication between GenAI-powered applications relies on RAG-based inference, an attacker can initiate a computer worm-like chain reaction that we call Morris-II. This is done by crafting an adversarial self-replicating prompt that triggers a cascade of indirect prompt injections within the ecosystem and forces each affected application to perform malicious actions and compromise the RAG of additional applications. We evaluate the performance of the worm in creating a chain of confidential user data extraction within a GenAI ecosystem of GenAI-powered email assistants and analyze how the performance of the worm is affected by the size of the context, the adversarial self-replicating prompt used, the type and size of the embedding algorithm employed, and the number of hops in the propagation. Finally, we introduce the Virtual Donkey, a guardrail intended to detect and prevent the propagation of Morris-II with minimal latency, high accuracy, and a low false-positive rate. We evaluate the guardrail's performance and show that it yields a perfect true-positive rate of 1.0 with a false-positive rate of 0.015, and is robust against out-of-distribution worms, consisting of unseen jailbreaking commands, a different email dataset, and various worm usecases.

References (37)

Citations (11)

View on Semantic Scholar

Summary

The paper presents Morris II, the first zero-click worm targeting GenAI ecosystems using adversarial self-replicating prompts.
It details a methodology that leverages replication, propagation, and precise attack execution in both RAG-based and application flow scenarios.
The findings highlight an urgent need for enhanced security measures and further research into defenses against adversarial prompt-based attacks.

Unleashing Zero-click Worms on GenAI Ecosystems: The Morris II Malware

In recent years, the integration of Generative AI (GenAI) into applications has led to the formation of interconnected ecosystems of semi/fully autonomous agents powered by advanced AI services. While previous research has focused on specific risks associated with individual GenAI components, such as dialog poisoning, membership inference, and prompt leaking, a critical gap remains: whether attackers can exploit the GenAI layer to develop self-propagating malware targeting the entire ecosystem. This paper, authored by Stav Cohen, Ron Bitton, and Ben Nassi, introduces Morris II, the first worm designed to target GenAI ecosystems through adversarial self-replicating prompts.

Overview of Morris II Worm

Morris II draws inspiration from the original Morris Worm but differs by targeting GenAI-powered environments. The worm exploits the inherent connectivity of these ecosystems, using adversarial prompts to traverse and infect a network of GenAI agents. The worm demonstrates three key properties: replication, propagation, and executing malicious activities.

Replication

Replication within Morris II relies on adversarial self-replicating prompts, which compel GenAI models to replicate the prompt. This is similar in concept to SQL injection or buffer overflow attacks, where code execution is manipulated to achieve replication. The paper explains two forms of these prompts:

Direct Replication: The GenAI model directly outputs the input prompt.
Conditional Replication: A prompt embedded within larger input data triggers the output, embedded with the prompt and additional malicious content.

Propagation

Propagation occurs through the agents' application logic, leveraging the algorithms and policies that dictate the agents' interactions within the ecosystem. The paper discusses two primary modes:

RAG-Based Propagation: Involves poisoning the Retrieval-Augmented Generation (RAG) databases, compelling the agents to include adversarial prompts in their responses, which then propagate the infection.
Application Flow Steering: Tailoring inputs to direct the flow of applications towards actions that propagate the worm, such as forwarding malicious emails.

Experimental Evaluation

The authors thoroughly evaluate Morris II in real-world scenarios:

They deploy Morris II against GenAI-powered email assistants using RAG and non-RAG-based architectures.
The experiments cover black-box and white-box settings, showing successful worm propagation and execution of malicious payloads across multiple GenAI models.

Numerical Results

The researchers provide strong quantitative results, illustrating the efficacy of Morris II:

Success Rates: High success rates in replication and payload execution, particularly in steering application flows.
Error Rates & Precision: Low error rates and high precision in terms of attack execution, confirming the robustness of the adversarial prompts.
Propagation Rates: Demonstrated effective propagation across GenAI models like Gemini Pro, ChatGPT 4.0, and LLaVA.

These results are pivotal as they not only confirm the worm's capabilities but also provide a potent example of the risks associated with GenAI ecosystems.

Practical and Theoretical Implications

Practically, the paper's findings necessitate immediate action:

Security Prioritization: Companies must prioritize securing GenAI integrations.
Detection Mechanisms: Developing robust detection and prevention mechanisms tailored to GenAI ecosystems.

Theoretically, it opens new research avenues:

Adversarial Defenses: Investigating advanced defenses against adversarial self-replicating prompts.
Worm Characteristics: Furthering the understanding of unique malware features in GenAI contexts.

Future Developments

Given the rapid pace of GenAI adoption, future research will likely focus on:

Enhanced Defense Mechanisms: Developing algorithms specifically designed to detect and mitigate the propagation of such worms.
Broader Application Scenarios: Examining the threats posed by GenAI malware in other critical areas like healthcare, finance, and industrial control systems.

Conclusion

Morris II represents a significant advancement in understanding the security implications of GenAI ecosystems. It emphasizes the urgency for developing robust security measures to counteract potential threats posed by adversarial machine learning techniques. The findings of Cohen, Bitton, and Nassi underscore the importance of preemptive action and continuous research to safeguard the ever-expanding landscape of GenAI-powered applications.

PDF Markdown

Related Papers

Tweets

https://twitter.com/lilacj4de/status/1887131523616493839

https://twitter.com/sujal_maiti/status/1769699101816356887

https://twitter.com/niplav_site/status/1849917234422157548

https://twitter.com/_junaidkhalid1/status/1766068840750874807

https://twitter.com/delarosaceae/status/1768749324886806880

https://twitter.com/XtalSlow/status/1765785583933161503

YouTube

Show All Videos

HackerNews

Here Comes the AI Worm (5 points, 0 comments)

Reddit

Does anyone know how does MS-Copilot/Graph Semantic Index will defend against this attack vector? (3 points, 7 comments)