Develop and evaluate mitigation strategies against representation-level hijacking
Develop and rigorously evaluate mitigation strategies that defend large language models against in-context representation hijacking of token semantics, moving beyond attack characterization to concrete protective mechanisms.
Sponsor
References
Second, our work focuses on the attack surface and does not yet evaluate specific mitigation strategies. These open questions serve as stepping stones toward a new research frontier: representation-level alignment and defense.
— In-Context Representation Hijacking
(2512.03771 - Yona et al., 3 Dec 2025) in Section 6, Discussion, limitations, and future work