Attack against unrestricted permutations of transformer hidden states

Develop a successful decoding attack that reconstructs the original input token sequence from unrestrictedly permuted hidden states in decoder-only transformer large language models, where unrestricted permutation means any element of the N x d hidden-state matrix can be moved to any row or column index without restriction. The goal is to extend the demonstrated attacks beyond sequence-dimension, hidden-dimension, and factorized 2D permutations to this fully unconstrained elementwise permutation setting, thereby determining whether such permutations can be reversed to recover the user’s prompt.

Background

The paper introduces a vocabulary-matching attack that can recover original prompts from intermediate hidden states of LLMs with near-perfect accuracy, even when those hidden states are permuted. The authors demonstrate effectiveness against three permutation types commonly proposed in privacy-preserving inference schemes: sequence-dimension permutations, hidden-dimension permutations, and factorized 2D permutations (row-wise plus per-token hidden permutations).

These results undermine the security assumptions of recent private inference protocols (PermLLM, STIP, and Centaur) that rely on the difficulty of reversing permuted hidden states. However, while the attack succeeds across the studied permutation classes, the authors note that they have not yet demonstrated success for a stronger threat model: unrestricted permutations where any element of the N x d hidden-state matrix can be moved arbitrarily to any position. Resolving this would determine whether even the most general elementwise permutations can be reversed to recover the original input tokens.

References

We have not yet demonstrated a successful attack against unrestricted permutations of hidden states, i.e. where any element of the $N \times d$ matrix of hiddens can be moved to any column or row index without restriction.

— An Attack to Break Permutation-Based Private Third-Party Inference Schemes for LLMs (2505.18332 - Thomas et al., 23 May 2025) in Conclusion and Future Work

Attack against unrestricted permutations of transformer hidden states

Sponsor

Background

References

Related Problems