WIPI: A New Web Threat for LLM-Driven Web Agents (2402.16965v1)
Abstract: With the fast development of LLMs, LLM-driven Web Agents (Web Agents for short) have obtained tons of attention due to their superior capability where LLMs serve as the core part of making decisions like the human brain equipped with multiple web tools to actively interact with external deployed websites. As uncountable Web Agents have been released and such LLM systems are experiencing rapid development and drawing closer to widespread deployment in our daily lives, an essential and pressing question arises: "Are these Web Agents secure?". In this paper, we introduce a novel threat, WIPI, that indirectly controls Web Agent to execute malicious instructions embedded in publicly accessible webpages. To launch a successful WIPI works in a black-box environment. This methodology focuses on the form and content of indirect instructions within external webpages, enhancing the efficiency and stealthiness of the attack. To evaluate the effectiveness of the proposed methodology, we conducted extensive experiments using 7 plugin-based ChatGPT Web Agents, 8 Web GPTs, and 3 different open-source Web Agents. The results reveal that our methodology achieves an average attack success rate (ASR) exceeding 90% even in pure black-box scenarios. Moreover, through an ablation study examining various user prefix instructions, we demonstrated that the WIPI exhibits strong robustness, maintaining high performance across diverse prefix instructions.
- Aaron Browser. https://aaron-web-browser.aaronplugins.com/home/terms, 2023.
- Awesome-Chatgpt-Prompts. https://github.com/f/awesome-chatgpt-prompts, 2023.
- CS Rankings. https://csrankings.org/, 2023.
- Github copilot · your ai pair programmer. https://copilot.github.com/, 2023.
- Google Search. https://www.google.com/, 2023.
- GPTs Store. https://chat.openai.com/gpts, 2023.
- Introducing chatgpt - openai, 2023.
- IPQS malicious URL scanner. https://www.ipqualityscore.com/threat-feeds/malicious-url-scanner, 2023.
- KeyMate.AI GPT. https://www.keymate.ai/, 2023.
- New York Times. https://www.nytimes.com/, 2023.
- OpenAI. https://openai.com/, 2023.
- Personal Blog. https://barnesc.blogspot.com/, 2023.
- Reddit. https://www.reddit.com/, 2023.
- Text Generation Web UI. https://github.com/oobabooga/text-generation-webui, 2023.
- VirusTotal. https://www.virustotal.com/gui/home/url, 2023.
- Web GPT. https://plugin.wegpt.ai/, 2023.
- WebPilot ChatGPT Plugin. https://webreader.webpilotai.com/legal_info.html, 2023.
- NeuralMarcoro14-7B. https://huggingface.co/mlabonne/NeuralMarcoro14-7B, 2024.
- Anthropic. Claude 2. https://www.anthropic.com/index/claude-2, 2023.
- What are developers talking about? an analysis of topics and trends in stack overflow. Empirical software engineering, 19:619–654, 2014.
- Jailbreaking black box large language models in twenty queries. arXiv preprint arXiv:2310.08419, 2023.
- Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors in agents. arXiv preprint arXiv:2308.10848, 2023.
- Masterkey: Automated jailbreak across multiple large language model chatbots. arXiv preprint arXiv:2307.08715, 2023.
- Mathematical capabilities of chatgpt. arXiv preprint arXiv:2301.13867, 2023.
- Multimodal web navigation with instruction-finetuned foundation models. arXiv preprint arXiv:2305.11854, 2023.
- Chatgpt is not all you need. a state of the art review of large generative ai models. arXiv preprint arXiv:2301.04655, 2023.
- More than you’ve asked for: A comprehensive analysis of novel prompt injection threats to application-integrated large language models. arXiv e-prints, pages arXiv–2302, 2023.
- Catastrophic jailbreak of open-source llms via exploiting generation. arXiv preprint arXiv:2310.06987, 2023.
- Camels in a changing climate: Enhancing lm adaptation with tulu 2. arXiv preprint arXiv:2311.10702, 2023.
- Baseline defenses for adversarial attacks against aligned language models. arXiv preprint arXiv:2309.00614, 2023.
- Mixtral of experts. arXiv preprint arXiv:2401.04088, 2024.
- Measuring and modeling computer virus prevalence. In Proceedings 1993 IEEE Computer Society Symposium on Research in Security and Privacy, pages 2–15. IEEE, 1993.
- Recent worms: a survey and trends. In Proceedings of the 2003 ACM workshop on Rapid Malcode, pages 1–10, 2003.
- Gerd Kortemeyer. Could an artificial-intelligence agent pass an introductory physics course? Physical Review Physics Education Research, 19(1):010132, 2023.
- Kay Lehnert. Ai insights into theoretical physics and the swampland program: A journey through the cosmos with chatgpt. arXiv preprint arXiv:2301.08155, 2023.
- Prompt Injection attack against LLM-integrated Applications, June 2023. arXiv:2306.05499 [cs].
- Prompt injection attack against llm-integrated applications. arXiv preprint arXiv:2306.05499, 2023.
- Prompt Injection Attacks and Defenses in LLM-Integrated Applications, October 2023. arXiv:2310.12815 [cs].
- Research on SQL Injection Attack and Prevention Technology Based on Web. In 2019 International Conference on Computer Network, Electronic and Automation (ICCNEA), pages 176–179, 2019.
- Webgpt: Browser-assisted question-answering with human feedback, 2022.
- Revolutionizing language processing in libraries with sheetgpt: an integration of google sheet and chatgpt plugin. Library Hi Tech News, 2023.
- Generative agents: Interactive simulacra of human behavior, 2023.
- Asleep at the keyboard? assessing the security of github copilot’s code contributions. In 2022 IEEE Symposium on Security and Privacy (SP), pages 754–768. IEEE, 2022.
- Examining zero-shot vulnerability repair with large language models. In 2023 IEEE Symposium on Security and Privacy (SP), pages 1–18. IEEE Computer Society, 2022.
- From Prompt Injections to SQL Injection Attacks: How Protected is Your LLM-Integrated Web Application?, August 2023. arXiv:2308.01990 [cs].
- Ignore previous prompt: Attack techniques for language models. arXiv preprint arXiv:2211.09527, 2022.
- Ignore Previous Prompt: Attack Techniques For Language Models, November 2022. arXiv:2211.09527 [cs].
- Jatmo: Prompt Injection Defense by Task-Specific Finetuning, January 2024. arXiv:2312.17673 [cs].
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
- Html 4.01 specification. W3C recommendation, 24, 1999.
- " do anything now": Characterizing and evaluating in-the-wild jailbreak prompts on large language models. arXiv preprint arXiv:2308.03825, 2023.
- Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game, November 2023. arXiv:2311.01011 [cs].
- Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023.
- A taxonomy of computer worms. In Proceedings of the 2003 ACM workshop on Rapid Malcode, pages 11–18, 2003.
- Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483, 2023.
- Openagents: An open platform for language agents in the wild, 2023.
- Gentopia: A collaborative platform for tool-augmented llms. arXiv preprint arXiv:2308.04030, 2023.
- Fuzzllm: A novel and universal fuzzing framework for proactively discovering jailbreak vulnerabilities in large language models. arXiv preprint arXiv:2309.05274, 2023.
- Webshop: Towards scalable real-world web interaction with grounded language agents. Advances in Neural Information Processing Systems, 35:20744–20757, 2022.
- Benchmarking and defending against indirect prompt injection attacks on large language models. arXiv preprint arXiv:2312.14197, 2023.
- A Novel Evaluation Framework for Assessing Resilience Against Prompt Injection Attacks in Large Language Models, January 2024. arXiv:2401.00991 [cs].
- Gptfuzzer: Red teaming large language models with auto-generated jailbreak prompts. arXiv preprint arXiv:2309.10253, 2023.
- Webarena: A realistic web environment for building autonomous agents. arXiv preprint arXiv:2307.13854, 2023.
- Agents: An open-source framework for autonomous language agents. arXiv preprint arXiv:2309.07870, 2023.