Root cause of indirect prompt injection attacks
Prove that the root cause of indirect prompt injection attacks against large language models is twofold: (i) the inability of large language models to distinguish between external content and user instructions, and (ii) the absence of awareness in large language models to refrain from executing instructions embedded within external content.
References
To explain the success of indirect prompt injection attacks, we propose the following conjecture: The root cause of indirect prompt injection attacks is twofold: firstly, the LLMs' inability to distinguish between external content and user instructions; and secondly, the absence of LLMs' awareness to not execute instructions embedded within external content.
— Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models
(2312.14197 - Yi et al., 2023) in Methods, Defenses Against Indirect Prompt Injection, Conjecture 1