An Early Categorization of Prompt Injection Attacks on Large Language Models (2402.00898v1)
Abstract: LLMs and AI chatbots have been at the forefront of democratizing artificial intelligence. However, the releases of ChatGPT and other similar tools have been followed by growing concerns regarding the difficulty of controlling LLMs and their outputs. Currently, we are witnessing a cat-and-mouse game where users attempt to misuse the models with a novel attack called prompt injections. In contrast, the developers attempt to discover the vulnerabilities and block the attacks simultaneously. In this paper, we provide an overview of these emergent threats and present a categorization of prompt injections, which can guide future research on prompt injections and act as a checklist of vulnerabilities in the development of LLM interfaces. Moreover, based on previous literature and our own empirical research, we discuss the implications of prompt injections to LLM end users, developers, and researchers.
- On the dangers of stochastic parrots: Can language models be too big?, in: Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pp. 610–623.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 .
- Language models are few-shot learners. Advances in neural information processing systems 33, 1877–1901.
- Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712 .
- The security hole at the heart of chatgpt and bing. Wired URL: https://www.wired.co.uk/article/chatgpt-prompt-injection-attack-security.
- Extracting training data from large language models., in: USENIX Security Symposium.
- BERT: Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota. pp. 4171–4186. URL: https://aclanthology.org/N19-1423, doi:10.18653/v1/N19-1423.
- Ai-powered bing chat spills its secrets via prompt injection attack. Ars Technica URL: https://arstechnica.com/information-technology/2023/02/ai-powered-bing-chat-spills-its-secrets-via-prompt-injection-attack/.
- Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection arXiv:2302.12173.
- From chatgpt to threatgpt: Impact of generative ai in cybersecurity and privacy. IEEE Access 11, 80218–80245. doi:10.1109/ACCESS.2023.3300381.
- The race to understand the exhilarating, dangerous world of language ai. MIT Technology Review .
- Three ways ai chatbots are a security disaster. MIT Technology Review .
- Exploiting programmatic behavior of llms: Dual-use through standard security attacks. arXiv preprint arXiv:2302.05733 .
- What are large language models used for? URL: https://blogs.nvidia.com/blog/2023/01/26/what-are-large-language-models-used-for/.
- Multi-step jailbreaking privacy attacks on chatgpt. arXiv preprint arXiv:2304.05197 .
- Evaluating the instruction-following robustness of large language models to prompt injection. arXiv:2308.10819.
- Towards understanding and mitigating social biases in language models, in: Meila, M., Zhang, T. (Eds.), Proceedings of the 38th International Conference on Machine Learning, PMLR. pp. 6565–6576. URL: https://proceedings.mlr.press/v139/liang21a.html.
- Design guidelines for prompt engineering text-to-image generative models, in: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, pp. 1–23.
- Use of llms for illicit purposes: Threats, prevention measures, and vulnerabilities. arXiv:2308.12833.
- I built a zero day virus with undetectable exfiltration using only chatgpt prompts. Forcepoint URL: https://www.forcepoint.com/blog/x-labs/zero-day-exfiltration-using-chatgpt-prompts.
- OpenAI, 2022. Introducing chatgpt. URL: https://openai.com/blog/chatgpt#OpenAI.
- OpenAI, 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 .
- Ignore previous prompt: Attack techniques for language models. arXiv preprint arXiv:2211.09527 .
- An important next step on our ai journey. URL: https://blog.google/technology/ai/bard-google-ai-search-updates/.
- Preamble, 2022. Declassifying the responsible disclosure of the prompt injection attack vulnerability of gpt-3. URL: https://www.preamble.com/prompt-injection-a-critical-vulnerability-in-the-gpt-3-transformer-and-how-we-can-begin-to-solve-it.
- Latent jailbreak: A benchmark for evaluating text safety and output robustness of large language models. arXiv:2307.08487.
- Language models are unsupervised multitask learners. OpenAI blog 1, 9.
- Zero-shot text-to-image generation, in: International Conference on Machine Learning, PMLR. pp. 8821–8831.
- Large pre-trained language models contain human-like biases of what is right and wrong to do. Nature Machine Intelligence 4, 258–268.
- Do anything now: Characterizing and evaluating in-the-wild jailbreak prompts on large language models. arXiv:2308.03825.
- Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239 .
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 .
- Meta’s powerful ai language model has leaked online — what happens now? The Verge URL: https://www.theverge.com/2023/3/8/23629362/meta-ai-language-model-llama-leak-online-misusey.
- Cyber hygiene: The concept, its measure, and its initial tests. Decision Support Systems 128, 113160.
- Prompt injection attacks against gpt-3. URL: https://simonwillison.net/2022/Sep/12/prompt-injection/.
- Virtual prompt injection for instruction-tuned large language models. arXiv:2307.16888.
- Is chatgpt fair for recommendation? evaluating fairness in large language model recommendation. arXiv:2305.07609.
- Exploring ai ethics of chatgpt: A diagnostic analysis. arXiv preprint arXiv:2301.12867 .
- Universal and transferable adversarial attacks on aligned language models. arXiv:2307.15043.