Dynamics of the tokenizer-transplant vulnerability in multimodal and divergent-script settings

Investigate the dynamics of the shared-basis tokenizer transplant vulnerability in multimodal models and in languages with extremely divergent scripts by determining whether and how a single engineered breaker token remains inert in the donor model yet becomes a high-salience trigger in the base model after transplant via coefficient reuse.

Background

The paper demonstrates a training-free attack on tokenizer transplant pipelines used to align vocabularies across different LLM families. By exploiting shared-basis coefficient reuse, an attacker can append a single "breaker token" to a donor tokenizer that is functionally inert pre-transplant but reconstructs into a high-salience direction in the base model post-transplant, causing the base model to emit the malicious token during generation.

Empirical results show the attack’s effectiveness across multiple text-only model families and transplant operators, its persistence after fine-tuning and merging, and its stealth under common geometric audits. However, the evaluation is restricted to text-based LLMs using related scripts, leaving open whether similar asymmetric realizability and stealth properties occur in multimodal contexts (e.g., vision-LLMs) or in language families with highly divergent scripts.

References

Finally, our evaluation is currently bounded to text-based LLMs; exploring the dynamics of this vulnerability in multimodal contexts or extremely divergent script families remains an open avenue for future research.

The Trojan in the Vocabulary: Stealthy Sabotage of LLM Composition  (2601.00065 - Liu et al., 31 Dec 2025) in Limitations