Effect of Retokenization-Induced Spacing on ArtPrompt’s Success
Determine whether the additional spaces introduced by Byte Pair Encoding (BPE) dropout retokenization effectively function as a new ASCII art font for the ArtPrompt jailbreak attack, thereby reducing the likelihood of triggering safety measures in the aligned large language models evaluated in the paper (GPT-3.5 0613, GPT-4 0613, Claude v2, Gemini Pro, and Llama2 Chat-7B).
References
We note that Retokenization may even help ArtPrompt to improve ASR. We conjecture that this is because the spaces introduced by Retokenization forms a new font for ArtPrompt, which further reduces the chance of triggering safety measures deployed by victim models.
— ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs
(2402.11753 - Jiang et al., 19 Feb 2024) in Section 4.2 Experimental Results, paragraph "ArtPrompt can bypass existing defenses against jailbreak attacks" (following Table titled "This table presents the effectiveness of ArtPrompt when PPL, Paraphrase, or Retokenization is employed by victim LLMs.")