Attack against unrestricted permutations of transformer hidden states
Develop a successful decoding attack that reconstructs the original input token sequence from unrestrictedly permuted hidden states in decoder-only transformer large language models, where unrestricted permutation means any element of the N x d hidden-state matrix can be moved to any row or column index without restriction. The goal is to extend the demonstrated attacks beyond sequence-dimension, hidden-dimension, and factorized 2D permutations to this fully unconstrained elementwise permutation setting, thereby determining whether such permutations can be reversed to recover the user’s prompt.
Sponsor
References
We have not yet demonstrated a successful attack against unrestricted permutations of hidden states, i.e. where any element of the $N \times d$ matrix of hiddens can be moved to any column or row index without restriction.