Feasibility of recovering the secret prompt-key from stegotext

Investigate whether an attacker who knows the exact Large Language Model used in the rank-preserving steganographic protocol can practically recover the secret prompt-key k from the stegotext s by exploiting the fact that k is a mostly sound natural‑language instruction coherent with the context of s, and quantify any achievable reduction of the key search space compared to naive brute force over the tokenizer vocabulary.

Background

The paper introduces a generative steganography protocol that hides an original text e inside a different, plausible text s using an LLM by preserving token ranks. Security analysis considers the scenario where an attacker knows the LLM but not the secret prompt-key k used to steer s and encode e.

The authors note that brute-force search over k is prohibitive due to the large vocabulary, but suggest a potential attack using properties of s to constrain k because k is expected to be a sensible, coherent instruction. They explicitly state that the feasibility of this attack is unclear and call it an open research question, while mentioning that adding randomness to k may thwart such attempts.

References

However, the attacker could reduce the search space using the information revealed by s, since k is expected to be a mostly sound instruction in natural language and coherent with the context of s. Although the feasibility of such an approach is unclear and remains an open research question, we note that inserting a simple random string in k is enough to nip it in the bud, an example is shown in Figure~\ref{fig-harry}.

— LLMs can hide text in other text of the same length (2510.20075 - Norelli et al., 22 Oct 2025) in Section: Security

Feasibility of recovering the secret prompt-key from stegotext

Background

References

Related Problems