Cross-linguistic generality of adversarial poetry jailbreaks
Determine whether the elevated attack-success rates induced by poetic reformulation of harmful prompts generalize beyond English and Italian to other languages, scripts, and culturally distinct poetic forms, and ascertain how any cross-linguistic generalization interacts with model pretraining corpora and safety alignment distributions.
References
Sixth, the evaluation is limited to English and Italian prompts. The generality of the effect across other languages, scripts, or culturally distinct poetic forms is unknown and may interact with both pretraining corpora and alignment distributions.
— Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models
(2511.15304 - Bisconti et al., 19 Nov 2025) in Section Analysis, Subsection Limitations