Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection (2302.12173v2)

Published 23 Feb 2023 in cs.CR, cs.AI, cs.CL, and cs.CY
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

Abstract: LLMs are increasingly being integrated into various applications. The functionalities of recent LLMs can be flexibly modulated via natural language prompts. This renders them susceptible to targeted adversarial prompting, e.g., Prompt Injection (PI) attacks enable attackers to override original instructions and employed controls. So far, it was assumed that the user is directly prompting the LLM. But, what if it is not the user prompting? We argue that LLM-Integrated Applications blur the line between data and instructions. We reveal new attack vectors, using Indirect Prompt Injection, that enable adversaries to remotely (without a direct interface) exploit LLM-integrated applications by strategically injecting prompts into data likely to be retrieved. We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities, including data theft, worming, information ecosystem contamination, and other novel security risks. We demonstrate our attacks' practical viability against both real-world systems, such as Bing's GPT-4 powered Chat and code-completion engines, and synthetic applications built on GPT-4. We show how processing retrieved prompts can act as arbitrary code execution, manipulate the application's functionality, and control how and if other APIs are called. Despite the increasing integration and reliance on LLMs, effective mitigations of these emerging threats are currently lacking. By raising awareness of these vulnerabilities and providing key insights into their implications, we aim to promote the safe and responsible deployment of these powerful models and the development of robust defenses that protect users and systems from potential attacks.

Indirect Prompt Injection Attacks on LLM-Integrated Applications

The paper "Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection" explores a critical security vulnerability in LLMs integrated into various applications. The authors highlight a new category of attacks termed Indirect Prompt Injection (IPI), presenting it as an uncharted threat vector in the field of LLM security.

Summary of the Paper

The core focus of the paper is on Indirect Prompt Injection, an attack mechanism where adversaries strategically inject prompts into data that the LLM is likely to retrieve during inference. This type of attack does not require direct user interaction with the LLM, thereby expanding the attack surface significantly. The authors argue that the integration of LLMs into applications, such as search engines and email clients, blurs the line between executable instructions and data, making the models susceptible to remote exploitation.

The research presented in the paper includes:

  • Taxonomy of Threats: The authors develop a comprehensive taxonomy from a computer security perspective to systematically investigate the impacts and vulnerabilities associated with IPI, including data theft, system control, and misinformation propagation.
  • Attack Demonstrations: Practical feasibility of IPI attacks is demonstrated on both real-world systems, such as Bing's GPT-4 powered Chat and code-completion engines, as well as synthetic applications built on GPT-4. Various attack scenarios such as malware distribution, information gathering, fraud, and system intrusion are meticulously executed and analyzed.
  • Discussion on Mitigations: While effective mitigations are currently lacking, the authors underscore the necessity of developing robust defenses to safeguard LLM-integrated applications against these emerging threats.

Key Findings and Implications

The research brings to light several important findings:

  1. Remote Control and Intrusion: The authors show how Indirect Prompt Injection can lead to full compromise of the LLM at inference time. This can enable remote control over the application, persistent system compromise, and arbitrary code execution.
  2. Dynamic and Autonomous Threats: The paper points out that the adversaries can achieve complex manipulations, such as disinformation campaigns and user manipulation, by leveraging LLM’s capabilities to autonomously interact with users and execute tasks.
  3. Multifaceted Attack Vectors: The introduction of retrieval-augmented models allows adversaries to exploit LLMs through both passive methods (e.g., poisoning public sources) and active methods (e.g., phishing emails). The potential attacks are multi-staged and can hide and encode prompts to evade detection and filtering mechanisms.
  4. Impact on Trust and Reliability: The paper emphasizes that LLM-integrated applications can significantly amplify the impact of these attacks due to their large user base and the high level of trust users place in AI-generated responses.

Future Directions

Given the evolving integration of LLMs in applications, the paper outlines several avenues for future research:

  • Secure Model Training and Fine-Tuning: Further research is required to develop robust training techniques that can immunize LLMs against indirect prompt injections.
  • Advanced Filtering and Detection Mechanisms: Development of sophisticated filtering systems capable of detecting malicious prompts while interpreting retrieved data and instructions.
  • Evaluation of Autonomy in AI Systems: Comprehensive studies on the security implications of more autonomous systems, including multi-agent frameworks and AI systems designed for autonomous planning and execution.
  • User and System-Level Mitigations: Investigate practical, user-friendly solutions that can alert users to potential manipulations and develop system-level guardrails to mitigate risks.

Conclusion

The paper contributes a critical perspective by presenting Indirect Prompt Injection as a novel and significant security threat in LLM-integrated applications. The implications of these findings highlight the urgent need for the cybersecurity community to address these vulnerabilities proactively. Moving forward, the development of secure and resilient AI systems will be integral to ensuring the safe deployment and ethical utilization of LLMs in real-world applications.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Kai Greshake (1 paper)
  2. Sahar Abdelnabi (21 papers)
  3. Shailesh Mishra (6 papers)
  4. Christoph Endres (2 papers)
  5. Thorsten Holz (52 papers)
  6. Mario Fritz (160 papers)
Citations (305)
Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com