Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
11 tokens/sec
Gemini 2.5 Pro Pro
53 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
10 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

Here Comes The AI Worm: Unleashing Zero-click Worms that Target GenAI-Powered Applications (2403.02817v2)

Published 5 Mar 2024 in cs.CR

Abstract: In this paper, we show that when the communication between GenAI-powered applications relies on RAG-based inference, an attacker can initiate a computer worm-like chain reaction that we call Morris-II. This is done by crafting an adversarial self-replicating prompt that triggers a cascade of indirect prompt injections within the ecosystem and forces each affected application to perform malicious actions and compromise the RAG of additional applications. We evaluate the performance of the worm in creating a chain of confidential user data extraction within a GenAI ecosystem of GenAI-powered email assistants and analyze how the performance of the worm is affected by the size of the context, the adversarial self-replicating prompt used, the type and size of the embedding algorithm employed, and the number of hops in the propagation. Finally, we introduce the Virtual Donkey, a guardrail intended to detect and prevent the propagation of Morris-II with minimal latency, high accuracy, and a low false-positive rate. We evaluate the guardrail's performance and show that it yields a perfect true-positive rate of 1.0 with a false-positive rate of 0.015, and is robust against out-of-distribution worms, consisting of unseen jailbreaking commands, a different email dataset, and various worm usecases.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Chatgpt jailbreak prompt: Unlock its full potential. https://www.awavenavr.com/chatgpt-jailbreak-prompts/.
  2. Chatgpt jailbreak prompts. https://www.theinsaneapp.com/2023/04/chatgpt-jailbreak-prompts.html.
  3. Chatgpt jailbreak prompts: How to unchain chatgpt. https://docs.kanaries.net/articles/chatgpt-jailbreak-prompt.
  4. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. In Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, pages 79–90, 2023.
  5. Wannacry ransomware: Analysis of infection, persistence, recovery prevention and propagation mechanisms. Journal of Telecommunications and Information Technology, (1):113–124, 2019.
  6. Understanding the mirai botnet. In 26th USENIX security symposium (USENIX Security 17), pages 1093–1110, 2017.
  7. (ab) using images and sounds for indirect instruction injection in multi-modal llms. arXiv preprint arXiv:2307.10490, 2023.
  8. Matt Bishop. Analysis of the iloveyou worm. Internet: http://nob. cs. ucdavis. edu/classes/ecs155-2005-04/handouts/iloveyou. pdf, 2000.
  9. Ana Brassard. The morris worm. 1988, 2023.
  10. Are aligned neural networks adversarially aligned? arXiv preprint arXiv:2306.15447, 2023.
  11. Jailbreaking black box large language models in twenty queries. arXiv preprint arXiv:2310.08419, 2023.
  12. Automated behavioral analysis of malware: A case study of wannacry ransomware. In 2017 16th IEEE International Conference on machine learning and applications (ICMLA), pages 454–460. IEEE, 2017.
  13. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023.
  14. Jailbreaker: Automated jailbreak across multiple large language model chatbots. arXiv preprint arXiv:2307.08715, 2023.
  15. W32. stuxnet dossier. White paper, symantec corp., security response, 5(6):29, 2011.
  16. ARMY FORCES COMMAND FORT MCPHERSON GA. Iloveyou virus lessons learned report. 2003.
  17. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
  18. The static analysis of wannacry ransomware. In 2018 20th international conference on advanced communication technology (ICACT), pages 153–158. IEEE, 2018.
  19. The dynamic analysis of wannacry ransomware. In 2018 20th International conference on advanced communication technology (ICACT), pages 159–166. IEEE, 2018.
  20. Christopher Kelty. The morris worm. Limn, 1(1), 2011.
  21. Recent worms: a survey and trends. In Proceedings of the 2003 ACM workshop on Rapid Malcode, pages 1–10, 2003.
  22. David Kushner. The real story of stuxnet. ieee Spectrum, 50(3):48–53, 2013.
  23. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020.
  24. A survey of internet worm detection and containment. IEEE Communications Surveys & Tutorials, 10(1):20–35, 2008.
  25. Visual instruction tuning. arXiv:2304.08485, 2023.
  26. Stuxnet under the microscope. ESET LLC (September 2010), 6, 2010.
  27. Scalable extraction of training data from (production) language models. arXiv preprint arXiv:2311.17035, 2023.
  28. Hilarie Orman. The morris worm: A fifteen-year perspective. IEEE Security & Privacy, 1(5):35–43, 2003.
  29. Ignore previous prompt: Attack techniques for language models. arXiv preprint arXiv:2211.09527, 2022.
  30. The “worm” programs—early experience with a distributed computation. Communications of the ACM, 25(3):172–180, 1982.
  31. Computer worms: Architectures, evasion strategies, and detection mechanisms. Journal of Information Assurance and Security, 4:69–83, 2009.
  32. A taxonomy of computer worms. In Proceedings of the 2003 ACM workshop on Rapid Malcode, pages 11–18, 2003.
  33. Next-gpt: Any-to-any multimodal llm, 2023.
  34. Cryptovirology: Extortion-based security threats and countermeasures. In Proceedings 1996 IEEE Symposium on Security and Privacy, pages 129–140. IEEE, 1996.
  35. Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043, 2023.
  36. The monitoring and early detection of internet worms. IEEE/ACM Transactions on networking, 13(5):961–974, 2005.
  37. Poisonedrag: Knowledge poisoning attacks to retrieval-augmented generation of large language models. arXiv preprint arXiv:2402.07867, 2024.
Citations (11)

Summary

  • The paper presents Morris II, the first zero-click worm targeting GenAI ecosystems using adversarial self-replicating prompts.
  • It details a methodology that leverages replication, propagation, and precise attack execution in both RAG-based and application flow scenarios.
  • The findings highlight an urgent need for enhanced security measures and further research into defenses against adversarial prompt-based attacks.

Unleashing Zero-click Worms on GenAI Ecosystems: The Morris II Malware

In recent years, the integration of Generative AI (GenAI) into applications has led to the formation of interconnected ecosystems of semi/fully autonomous agents powered by advanced AI services. While previous research has focused on specific risks associated with individual GenAI components, such as dialog poisoning, membership inference, and prompt leaking, a critical gap remains: whether attackers can exploit the GenAI layer to develop self-propagating malware targeting the entire ecosystem. This paper, authored by Stav Cohen, Ron Bitton, and Ben Nassi, introduces Morris II, the first worm designed to target GenAI ecosystems through adversarial self-replicating prompts.

Overview of Morris II Worm

Morris II draws inspiration from the original Morris Worm but differs by targeting GenAI-powered environments. The worm exploits the inherent connectivity of these ecosystems, using adversarial prompts to traverse and infect a network of GenAI agents. The worm demonstrates three key properties: replication, propagation, and executing malicious activities.

Replication

Replication within Morris II relies on adversarial self-replicating prompts, which compel GenAI models to replicate the prompt. This is similar in concept to SQL injection or buffer overflow attacks, where code execution is manipulated to achieve replication. The paper explains two forms of these prompts:

  1. Direct Replication: The GenAI model directly outputs the input prompt.
  2. Conditional Replication: A prompt embedded within larger input data triggers the output, embedded with the prompt and additional malicious content.

Propagation

Propagation occurs through the agents' application logic, leveraging the algorithms and policies that dictate the agents' interactions within the ecosystem. The paper discusses two primary modes:

  1. RAG-Based Propagation: Involves poisoning the Retrieval-Augmented Generation (RAG) databases, compelling the agents to include adversarial prompts in their responses, which then propagate the infection.
  2. Application Flow Steering: Tailoring inputs to direct the flow of applications towards actions that propagate the worm, such as forwarding malicious emails.

Experimental Evaluation

The authors thoroughly evaluate Morris II in real-world scenarios:

  • They deploy Morris II against GenAI-powered email assistants using RAG and non-RAG-based architectures.
  • The experiments cover black-box and white-box settings, showing successful worm propagation and execution of malicious payloads across multiple GenAI models.

Numerical Results

The researchers provide strong quantitative results, illustrating the efficacy of Morris II:

  1. Success Rates: High success rates in replication and payload execution, particularly in steering application flows.
  2. Error Rates & Precision: Low error rates and high precision in terms of attack execution, confirming the robustness of the adversarial prompts.
  3. Propagation Rates: Demonstrated effective propagation across GenAI models like Gemini Pro, ChatGPT 4.0, and LLaVA.

These results are pivotal as they not only confirm the worm's capabilities but also provide a potent example of the risks associated with GenAI ecosystems.

Practical and Theoretical Implications

Practically, the paper's findings necessitate immediate action:

  1. Security Prioritization: Companies must prioritize securing GenAI integrations.
  2. Detection Mechanisms: Developing robust detection and prevention mechanisms tailored to GenAI ecosystems.

Theoretically, it opens new research avenues:

  1. Adversarial Defenses: Investigating advanced defenses against adversarial self-replicating prompts.
  2. Worm Characteristics: Furthering the understanding of unique malware features in GenAI contexts.

Future Developments

Given the rapid pace of GenAI adoption, future research will likely focus on:

  1. Enhanced Defense Mechanisms: Developing algorithms specifically designed to detect and mitigate the propagation of such worms.
  2. Broader Application Scenarios: Examining the threats posed by GenAI malware in other critical areas like healthcare, finance, and industrial control systems.

Conclusion

Morris II represents a significant advancement in understanding the security implications of GenAI ecosystems. It emphasizes the urgency for developing robust security measures to counteract potential threats posed by adversarial machine learning techniques. The findings of Cohen, Bitton, and Nassi underscore the importance of preemptive action and continuous research to safeguard the ever-expanding landscape of GenAI-powered applications.

Youtube Logo Streamline Icon: https://streamlinehq.com

HackerNews

  1. Here Comes the AI Worm (5 points, 0 comments)
Reddit Logo Streamline Icon: https://streamlinehq.com