Papers
Topics
Authors
Recent
Search
2000 character limit reached

PromptSep: Dynamic Separator Generation

Updated 3 July 2026
  • PromptSep is a dynamic separator-generation method that creates unique canary pairs per request to isolate user input from system instructions.
  • It employs domain-separated SHA-256 digests of a timestamp, session ID, and nonce to mitigate risks associated with separator reuse.
  • Empirical evaluation shows significant reduction in attack success rate and leakage with minimal runtime overhead, ensuring compatibility with existing LLM pipelines.

PromptSep is a dynamic separator-generation extension to Polymorphic Prompt Assembling (PPA), a prompt-injection defense for LLM agents that isolates user input from system instructions by inserting separator pairs. In its original static form, PPA randomly selects separators from a fixed pool, which is effective but vulnerable to separator reuse: once a separator leaks, it can be exploited in future requests. PromptSep replaces fixed-pool reuse with per-request generation of a unique (BEGIN,END)(\mathrm{BEGIN}, \mathrm{END}) canary pair derived from domain-separated SHA-256 digests over a timestamp, session identifier, and cryptographic nonce, thereby reducing separator leakage exposure to a single request while requiring no model fine-tuning and remaining backward compatible with the existing PPA SDK (Dorzhiev et al., 28 May 2026).

1. Position within prompt-injection defense

Polymorphic Prompt Assembling defends LLM agents against prompt injections by randomly selecting separator pairs from a fixed pool to isolate user input from system instructions. The central limitation identified for that design is a blast-radius vulnerability: static pool reuse means that once an attacker learns a separator pair in one interaction, the same pair may remain reusable in later interactions. PromptSep addresses precisely this failure mode by assigning each assembled prompt a fresh canary pair, so that leakage in one request does not directly transfer to future requests (Dorzhiev et al., 28 May 2026).

The mechanism is architectural rather than model-centric. It does not rely on model fine-tuning, classifier retraining, or prompt-engine changes. This is significant because the intervention occurs at prompt assembly time rather than in the model weights, which suggests a deployment path compatible with existing agent stacks and inference pipelines. A plausible implication is that PromptSep is best understood as a hardening layer for prompt construction, not as a replacement for broader agent security controls.

A common misconception is to treat dynamic separators as a complete solution to prompt injection. The reported contribution is narrower and more precise: it reduces separator reuse and empirically lowers Attack Success Rate (ASR) on the tested payloads. This suggests robust mitigation of a specific exploitation channel rather than a claim of universal resistance to all prompt-injection strategies.

2. Digest construction and canary mapping

PromptSep generates two domain-separated digests, one for the left separator and one for the right separator. The inputs are tNt \in \mathbb{N}, a nanosecond timestamp; s{0,1}128s \in \{0,1\}^{128}, a UUIDv4 session identifier; and n{0,1}128n \in \{0,1\}^{128}, a per-call nonce. Domain separation is implemented with the labels LB="BEGIN:"L_B = \texttt{"BEGIN:"} and LE="END:"L_E = \texttt{"END:"} (Dorzhiev et al., 28 May 2026).

HB=SHA256("BEGIN:"    t    s    n)H_B = \mathrm{SHA256}\bigl(\texttt{"BEGIN:"}\;\|\; t\;\|\; s\;\|\; n\bigr)

HE=SHA256("END:"    t    s    n)H_E = \mathrm{SHA256}\bigl(\texttt{"END:"}\;\|\; t\;\|\; s\;\|\; n\bigr)

Each digest is truncated to the first 24 hex characters, corresponding to 96 bits, in order to bound length and collision probability. The truncated digests are then wrapped in fixed-width delimiters:

left="====BEGIN-"    hex(HB)[0 ⁣: ⁣24]    "===="\text{left} = \texttt{"====BEGIN-"} \;\|\; \mathrm{hex}(H_B)[0\!:\!24] \;\|\; \texttt{"===="}

right="====END-"    hex(HE)[0 ⁣: ⁣24]    "===="\text{right} = \texttt{"====END-"} \;\|\; \mathrm{hex}(H_E)[0\!:\!24] \;\|\; \texttt{"===="}

This mapping yields a unique canary pair for each request. The implementation validates both separators before use. The validator enforces three PPA constraints: each separator must be single-line, must be at most 80 characters, and must contain no raw user text. These constraints preserve separator regularity while preventing the separator itself from becoming an injection carrier (Dorzhiev et al., 28 May 2026).

The construction is deliberately asymmetric: separate labels are hashed into the left and right canaries, so the delimiters are not merely mirror images of the same digest. This domain separation prevents accidental equality between begin and end markers and preserves the semantic distinction between prompt boundaries.

3. Blast radius, leakage semantics, and formal metrics

PromptSep’s security motivation is expressed through the notion of blast radius. Under static PPA, if an attacker leaks a separator pair tNt \in \mathbb{N}0 in one response, that pair may be reused in all future requests, producing what the description characterizes as effectively infinite blast radius. Under dynamic generation, request tNt \in \mathbb{N}1 receives a fresh canary pair tNt \in \mathbb{N}2, so a leak is valid only for that request; reuse on later requests fails because the canary changes (Dorzhiev et al., 28 May 2026).

For a sequence of tNt \in \mathbb{N}3 requests, let tNt \in \mathbb{N}4 indicate whether a separator was leaked in response tNt \in \mathbb{N}5. The empirical leak-rate per request is defined as

tNt \in \mathbb{N}6

The contrast between static and dynamic operation is conceptual as well as empirical. In static mode, a single leak at tNt \in \mathbb{N}7 can cause tNt \in \mathbb{N}8 for all tNt \in \mathbb{N}9, driving s{0,1}128s \in \{0,1\}^{128}0. In dynamic mode, even if s{0,1}128s \in \{0,1\}^{128}1, the changed canary makes reuse fail for s{0,1}128s \in \{0,1\}^{128}2, so future s{0,1}128s \in \{0,1\}^{128}3 values drop independently. This formalization clarifies that PromptSep targets temporal propagation of separator knowledge rather than merely one-off leakage events.

The evaluation also formalizes Attack Success Rate. For s{0,1}128s \in \{0,1\}^{128}4 trials of a given payload and mode, with s{0,1}128s \in \{0,1\}^{128}5 successes where the model obeys the injection, the estimator is

s{0,1}128s \in \{0,1\}^{128}6

Uncertainty is quantified using the 95% Wilson Score Confidence Interval with s{0,1}128s \in \{0,1\}^{128}7:

s{0,1}128s \in \{0,1\}^{128}8

These definitions matter because the main mitigation claim is not presented only as a raw percentage difference but as a statistically significant change verified by non-overlapping Wilson intervals (Dorzhiev et al., 28 May 2026).

4. Evaluation methodology and empirical findings

PromptSep was evaluated against 16 injection payloads on Llama-3.3-70B-Instruct-Turbo, with cross-model validation on DeepSeek-V4-Flash (Dorzhiev et al., 28 May 2026). The reported benchmark cases emphasize two distinct behaviors: direct injection success and separator leakage.

For the M1 obfuscation payload, described as leetspeak plus urgency, the dynamic mode lowers ASR from 0.88 to 0.38 at s{0,1}128s \in \{0,1\}^{128}9. The corresponding 95% Wilson intervals are n{0,1}128n \in \{0,1\}^{128}0 for static mode and n{0,1}128n \in \{0,1\}^{128}1 for dynamic mode, and the mitigation factor is reported as n{0,1}128n \in \{0,1\}^{128}2. Because the intervals do not overlap, the reduction is reported as statistically significant (Dorzhiev et al., 28 May 2026).

For the format_breakout_salad payload, the emphasis is leak-rate rather than obedience. With n{0,1}128n \in \{0,1\}^{128}3, static mode produces n{0,1}128n \in \{0,1\}^{128}4 and n{0,1}128n \in \{0,1\}^{128}5, whereas dynamic mode produces n{0,1}128n \in \{0,1\}^{128}6 and n{0,1}128n \in \{0,1\}^{128}7. The reported interpretation is that static separator leakage is eliminated entirely in dynamic mode, confirming blast-radius reduction in practice (Dorzhiev et al., 28 May 2026).

Payload Mode Result
M1 obfuscation Static ASR 0.88; CI n{0,1}128n \in \{0,1\}^{128}8
M1 obfuscation Dynamic ASR 0.38; CI n{0,1}128n \in \{0,1\}^{128}9; 2.3× mitigation
format_breakout_salad Static leak_rate 0.467; ASR 0.000
format_breakout_salad Dynamic leak_rate 0.000; ASR 0.000

These measurements support two different claims. First, dynamic separators can materially reduce successful prompt following under at least one obfuscated attack pattern. Second, even when ASR is already zero, dynamic generation can still matter because leakage suppression is itself a security objective. This suggests that PromptSep improves both immediate robustness and containment of information that could otherwise be weaponized later.

5. Runtime cost, API integration, and deployment considerations

The implementation is designed as a low-overhead extension to the existing PPA SDK. The SDK exposes an argument separator_mode ∈ {"static","dynamic"}, and the default remains "static". The function double_prompt_assemble() now returns a third value, canary=(left,right), while existing callers that ignore this return continue to work. A new method, leak_detect_detail(response,canary), reports which side leaked and enables per-side attribution without breaking existing APIs (Dorzhiev et al., 28 May 2026).

Microbenchmarking is reported on 1,000 local iterations. Dynamic separator generation adds on average LB="BEGIN:"L_B = \texttt{"BEGIN:"}0 per request over static pool selection. The measurement includes timestamp reading via time.time_ns, OS entropy via secrets.token_bytes, two SHA-256 hashes and truncation, and validate_separator calls. The procedure uses wall-clock timing around prompt_assemble() in isolation and subtracts baseline loop overhead. The paper further states that the result is negligible relative to typical LLM inference latency greater than 500 ms (Dorzhiev et al., 28 May 2026).

The performance note is paired with an end-to-end comparison given as static approximately LB="BEGIN:"L_B = \texttt{"BEGIN:"}1 ms versus dynamic approximately LB="BEGIN:"L_B = \texttt{"BEGIN:"}2 ms, with LB="BEGIN:"L_B = \texttt{"BEGIN:"}3 reported. Within the paper’s framing, the central operational point is that prompt-assembly overhead remains minor relative to inference latency. Consequently, PromptSep is presented as a systems-level hardening step whose computational footprint is small enough to fit latency-sensitive inference pipelines.

For production security, the paper notes one additional hardening measure: to protect against insider leakage of LB="BEGIN:"L_B = \texttt{"BEGIN:"}4, deployments should swap unkeyed SHA256 for HMAC-SHA256 with a server-side secret. This does not alter the basic interface but changes the trust assumptions around the digest inputs. A plausible implication is that the baseline construction primarily addresses separator reuse under ordinary external attack models, while HMAC-SHA256 extends protection against adversaries with stronger visibility into assembly metadata.

6. Relation to adjacent research and terminology

PromptSep operates in the space of prompt-injection defense and prompt assembly, which is distinct from prompt privacy and from other research that uses the same or similar name. A nearby but orthogonal line of work is PrLB="BEGIN:"L_B = \texttt{"BEGIN:"}5mpt, a prompt sanitizer that protects sensitive prompt tokens during inference by combining format-preserving encryption for format-only tokens with metric differential privacy for value-dependent tokens (Chowdhury et al., 7 Apr 2025). The two systems address different threat models: PromptSep limits reuse of leaked separators in assembled prompts, whereas PrLB="BEGIN:"L_B = \texttt{"BEGIN:"}6mpt sanitizes sensitive prompt content before submission to an LLM API. This suggests complementarity rather than direct substitution.

The name “PromptSep” is also used in unrelated domains. In audio generation, “PromptSep: Generative Audio Separation via Multimodal Prompting” denotes a conditional diffusion framework for open-vocabulary sound extraction and removal conditioned on text or vocal imitation (Wen et al., 6 Nov 2025). In multi-talker ASR, the term refers to Serialized Output Prompting, in which CTC-derived serialized outputs are supplied as prompts to an LLM decoder for mixed-speech transcription (Shi et al., 1 Sep 2025). These usages share a prompt-centric framing but not the security objective, architecture, or threat model of dynamic separator generation.

The terminological overlap is more than a naming curiosity. It underscores that “prompt separation” can refer to semantically different operations: delimiting instruction boundaries in LLM security, separating sources in audio generation, or structuring intermediate outputs for speech recognition. In the context of prompt-injection defense, however, PromptSep specifically denotes the dynamic per-request separator-generation mechanism introduced as an extension to PPA (Dorzhiev et al., 28 May 2026).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to PromptSep.