Transferability of adversarial batch effects across prompts from the same distribution
Determine whether inputs sampled from the same distribution as a targeted prompt x* exhibit sufficiently similar expert routing assignments in Mixture-of-Experts transformer models such that an adversarially optimized set of batch inputs, constructed against x*, is more likely to affect their outputs when batched together.
Sponsor
References
We conjecture that if these other data-points are sampled from the same distribution as $x*$, they are more likely to have similar expert routing assignments, and so are more likely to be affected by $\tilde{X}{\mathcal{A}$, which was optimized specifically for $x*$.
— Buffer Overflow in Mixture of Experts
(2402.05526 - Hayes et al., 8 Feb 2024) in Section 4: Anecdotal evidence of transferability to different prompts