Here Comes The AI Worm: Unleashing Zero-click Worms that Target GenAI-Powered Applications (2403.02817v2)
Abstract: In this paper, we show that when the communication between GenAI-powered applications relies on RAG-based inference, an attacker can initiate a computer worm-like chain reaction that we call Morris-II. This is done by crafting an adversarial self-replicating prompt that triggers a cascade of indirect prompt injections within the ecosystem and forces each affected application to perform malicious actions and compromise the RAG of additional applications. We evaluate the performance of the worm in creating a chain of confidential user data extraction within a GenAI ecosystem of GenAI-powered email assistants and analyze how the performance of the worm is affected by the size of the context, the adversarial self-replicating prompt used, the type and size of the embedding algorithm employed, and the number of hops in the propagation. Finally, we introduce the Virtual Donkey, a guardrail intended to detect and prevent the propagation of Morris-II with minimal latency, high accuracy, and a low false-positive rate. We evaluate the guardrail's performance and show that it yields a perfect true-positive rate of 1.0 with a false-positive rate of 0.015, and is robust against out-of-distribution worms, consisting of unseen jailbreaking commands, a different email dataset, and various worm usecases.
- Chatgpt jailbreak prompt: Unlock its full potential. https://www.awavenavr.com/chatgpt-jailbreak-prompts/.
- Chatgpt jailbreak prompts. https://www.theinsaneapp.com/2023/04/chatgpt-jailbreak-prompts.html.
- Chatgpt jailbreak prompts: How to unchain chatgpt. https://docs.kanaries.net/articles/chatgpt-jailbreak-prompt.
- Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. In Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, pages 79–90, 2023.
- Wannacry ransomware: Analysis of infection, persistence, recovery prevention and propagation mechanisms. Journal of Telecommunications and Information Technology, (1):113–124, 2019.
- Understanding the mirai botnet. In 26th USENIX security symposium (USENIX Security 17), pages 1093–1110, 2017.
- (ab) using images and sounds for indirect instruction injection in multi-modal llms. arXiv preprint arXiv:2307.10490, 2023.
- Matt Bishop. Analysis of the iloveyou worm. Internet: http://nob. cs. ucdavis. edu/classes/ecs155-2005-04/handouts/iloveyou. pdf, 2000.
- Ana Brassard. The morris worm. 1988, 2023.
- Are aligned neural networks adversarially aligned? arXiv preprint arXiv:2306.15447, 2023.
- Jailbreaking black box large language models in twenty queries. arXiv preprint arXiv:2310.08419, 2023.
- Automated behavioral analysis of malware: A case study of wannacry ransomware. In 2017 16th IEEE International Conference on machine learning and applications (ICMLA), pages 454–460. IEEE, 2017.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023.
- Jailbreaker: Automated jailbreak across multiple large language model chatbots. arXiv preprint arXiv:2307.08715, 2023.
- W32. stuxnet dossier. White paper, symantec corp., security response, 5(6):29, 2011.
- ARMY FORCES COMMAND FORT MCPHERSON GA. Iloveyou virus lessons learned report. 2003.
- Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
- The static analysis of wannacry ransomware. In 2018 20th international conference on advanced communication technology (ICACT), pages 153–158. IEEE, 2018.
- The dynamic analysis of wannacry ransomware. In 2018 20th International conference on advanced communication technology (ICACT), pages 159–166. IEEE, 2018.
- Christopher Kelty. The morris worm. Limn, 1(1), 2011.
- Recent worms: a survey and trends. In Proceedings of the 2003 ACM workshop on Rapid Malcode, pages 1–10, 2003.
- David Kushner. The real story of stuxnet. ieee Spectrum, 50(3):48–53, 2013.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020.
- A survey of internet worm detection and containment. IEEE Communications Surveys & Tutorials, 10(1):20–35, 2008.
- Visual instruction tuning. arXiv:2304.08485, 2023.
- Stuxnet under the microscope. ESET LLC (September 2010), 6, 2010.
- Scalable extraction of training data from (production) language models. arXiv preprint arXiv:2311.17035, 2023.
- Hilarie Orman. The morris worm: A fifteen-year perspective. IEEE Security & Privacy, 1(5):35–43, 2003.
- Ignore previous prompt: Attack techniques for language models. arXiv preprint arXiv:2211.09527, 2022.
- The “worm” programs—early experience with a distributed computation. Communications of the ACM, 25(3):172–180, 1982.
- Computer worms: Architectures, evasion strategies, and detection mechanisms. Journal of Information Assurance and Security, 4:69–83, 2009.
- A taxonomy of computer worms. In Proceedings of the 2003 ACM workshop on Rapid Malcode, pages 11–18, 2003.
- Next-gpt: Any-to-any multimodal llm, 2023.
- Cryptovirology: Extortion-based security threats and countermeasures. In Proceedings 1996 IEEE Symposium on Security and Privacy, pages 129–140. IEEE, 1996.
- Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043, 2023.
- The monitoring and early detection of internet worms. IEEE/ACM Transactions on networking, 13(5):961–974, 2005.
- Poisonedrag: Knowledge poisoning attacks to retrieval-augmented generation of large language models. arXiv preprint arXiv:2402.07867, 2024.