Feasibility of recovering the secret prompt-key from stegotext
Investigate whether an attacker who knows the exact Large Language Model used in the rank-preserving steganographic protocol can practically recover the secret prompt-key k from the stegotext s by exploiting the fact that k is a mostly sound natural‑language instruction coherent with the context of s, and quantify any achievable reduction of the key search space compared to naive brute force over the tokenizer vocabulary.
References
However, the attacker could reduce the search space using the information revealed by s, since k is expected to be a mostly sound instruction in natural language and coherent with the context of s. Although the feasibility of such an approach is unclear and remains an open research question, we note that inserting a simple random string in k is enough to nip it in the bud, an example is shown in Figure~\ref{fig-harry}.