Comprehensive architectural solution for distinguishing instructions from data in LLMs

Develop a comprehensive mechanism that reliably enforces a separation between instructions from trusted entities and data from untrusted sources within large language model (LLM) inference and application contexts, overcoming the current architectural inability to distinguish instructions from data so as to prevent prompt injection at the architectural level rather than via application-layer guardrails.

Background

The paper argues that prompt injection is enabled by a fundamental architectural property of LLMs: all input—system prompts, user messages, and retrieved documents—is tokenized and processed as a unified sequence, with no reliable boundary between instructions and data. Guardrails and safety training operate at the application layer and are characterized as pattern-matching defenses that can be bypassed by adversarial inputs, leaving the core architectural vulnerability intact.

In the conclusion, the authors explicitly state that there is currently no comprehensive solution to this architectural issue. This establishes a concrete unresolved question central to securing LLM-based systems against promptware threats within the proposed kill-chain framework.

References

The inability to distinguish instructions from data admits no known comprehensive solution at the time of this writing.

— The Promptware Kill Chain: How Prompt Injections Gradually Evolved Into a Multi-Step Malware (2601.09625 - Nassi et al., 14 Jan 2026) in Conclusion

Comprehensive architectural solution for distinguishing instructions from data in LLMs

Background

References

Related Problems