Invisible Unicode Characters
- Invisible Unicode characters are non-rendering codepoints that allow hidden data embedding and covert channels in text streams.
- They employ encoding schemes like zero-width binary, variation selectors, and tag encoding to transmit hidden binary messages.
- Experimental studies show these techniques can achieve near-perfect adversarial success in LLMs, posing serious security risks.
Invisible Unicode characters are codepoints in the Unicode standard that do not produce any visible glyph when rendered in common fonts, yet persist within the text buffer and influence downstream processing. These characters, including variation selectors, zero-width spaces, and tag characters, create covert channels for information embedding and have emerged as a potent vector for adversarial prompt injection, steganography, and bypassing input validation, particularly in modern LLMs and other Unicode-aware systems (Gao et al., 6 Oct 2025, Graves, 26 Feb 2026).
1. Classes and Formal Properties of Invisible Unicode Characters
The primary classes of invisible Unicode characters exploited in recent research comprise variation selectors (VS), zero-width spaces (ZWSP, U+200B), zero-width non-joiners (ZWNJ, U+200C), and Unicode tag characters (U+E0000–U+E007F). Each class is characterized by its invisibility in mainstream renderers, while retaining distinct codepoints detectable by text parsers and LLM tokenizers.
Variation Selectors:
- Originally designed to request alternate glyph variants (e.g., emoji styles), VS codepoints are defined in two contiguous Unicode blocks:
- VS1–VS16: U+FE00 through U+FE0F (16 codepoints)
- VS17–VS256: U+E0100 through U+E01EF (240 codepoints)
- If appended to ordinary alphanumerics or punctuation, the VS codepoint is rendered as "nothing" (invisible), persists through copy-paste, and is exposed to systems that process raw Unicode.
Zero-Width Characters and Tags:
- ZWSP (U+200B) and ZWNJ (U+200C) are completely invisible and designed for text shaping, yet function as ideal binary carriers.
- Unicode Tag Characters (U+E0000–U+E007F), originally intended for tagging content, are deprecated but persist as invisible codepoints that are handled distinctly by LLM tokenizers (Graves, 26 Feb 2026).
| Unicode Class | Codepoint Range | Typical Use / Encoding Role |
|---|---|---|
| Variation Selectors | U+FE00–U+FE0F, U+E0100–U+E01EF | Suffixes in LLM adversarial prompts |
| Zero-Width Space | U+200B | Bit 0 in binary encoding |
| Zero-Width Non-Joiner | U+200C | Bit 1 in binary encoding |
| Tag Characters | U+E0000–U+E007F | Character-level hidden messages |
These codepoints form a general class of adversarial channels for systems operating on Unicode (Gao et al., 6 Oct 2025).
2. Encoding and Injection Methodologies
Multiple formal encoding schemes exploit invisible Unicode characters for covert instruction injection or information embedding.
Zero-Width Binary Encoding:
- Encodes each bit of an ASCII payload as a sequence of ZWSP (bit 0) and ZWNJ (bit 1).
- Given a bitstream , the encoded string is:
- For a message like "Hi" (ASCII 0x48, 0x69), each bit is mapped to either U+200B or U+200C, embedded between visible words (Graves, 26 Feb 2026).
Unicode Tag Encoding:
- Maps each ASCII character to its corresponding tag character .
- The payload string is:
- Results in a string that is invisible but distinct at the codepoint level.
Variation Selector-Based Suffixes:
- Appends a randomly mutated sequence of VS codepoints to a visible prompt to maximize the likelihood that the LLM produces a targeted (often harmful or policy-violating) response.
- Each VS codepoint is mapped to multiple distinct LLM token IDs, forming a "secret" channel that is invisible to the user (Gao et al., 6 Oct 2025).
Prompt Placement and Framing:
- Invisible payloads are typically inserted at benign syntactic boundaries (e.g., between the first and second word in a sentence) to ensure imperceptibility.
- Hint levels may be varied from unhinted (no clue provided) to explicit decoding instructions, increasing LLM compliance (Graves, 26 Feb 2026).
3. Effect on LLMs and Tokenization
Invisible Unicode characters fundamentally alter the tokenization process in LLMs, resulting in divergent prompt representations between user-readable text and model-consumed input.
- Most LLM tokenizers operate at the level of Unicode codepoints or UTF-8 byte streams, and assign unique sub-token sequences to each VS, ZWSP, ZWNJ, or tag character (Gao et al., 6 Oct 2025, Graves, 26 Feb 2026).
- For example, VS-50 (U+E0121) is tokenized as $175, 254, 226, 94$ by GPT-4 and GPT-3.5; most selectors occupy four sub-tokens.
- The semantic content of a prompt remains visually unchanged, while adversarial suffixes composed from invisible codepoints manipulate the model’s conditional probabilities and output trajectories.
- This property enables the generation of adversarial suffixes 0 such that, for a prompt 1, the probability 2 (for target token 3, e.g., "Sure") is maximized, thus bypassing alignment mechanisms and model safety layers (Gao et al., 6 Oct 2025).
A key observation is that prompt injection using these codepoints can succeed at high rates across several leading LLM architectures, with Attack Success Rates (ASR) reaching up to 100% in controlled benchmarks.
4. Experimental Evidence and Impact Assessment
Empirical studies have validated both the efficacy and high impact of invisible Unicode character–based attacks against robust LLMs.
Imperceptible Jailbreaking Experiments:
- Using 50 malicious questions (AdvBench) and four aligned LLMs (Vicuna-13B-v1.5, Llama-2-Chat-7B, Llama-3.1-Instruct-8B, Mistral-7B-Instruct-v0.2), adversarial suffixes of length 4 (5 for Llama-3.1) achieved the following ASR (scored by a GPT-4 judge):
- Vicuna-13B-v1.5: 100%
- Llama-2-Chat-7B: 98%
- Llama-3.1-Instruct-8B: 80%
- Mistral-7B-Instruct-v0.2: 100%
- Baselines with visible modifications achieved lower or comparable rates (e.g., GCG: 54–98%), while random or no VS suffixes resulted in <30% ASR (Gao et al., 6 Oct 2025).
Reverse CAPTCHA Evaluation:
- Using five models (OpenAI: GPT-5.2, GPT-4o-mini; Anthropic: Claude Opus 4, Sonnet 4, Haiku 4.5), 8,308 graded outputs covering two encoding schemes and four hint levels revealed:
- Compliance (following the hidden Unicode-encoded instructions) was dramatically amplified by tool use (from 0.8% to 49.2% for Anthropic Haiku, with Cohen’s h up to 1.37).
- Encoding preference was provider specific: OpenAI models reliably decoded zero-width binary; Anthropic models preferred tag-based encodings.
- Increasing "hint" granularity (contextual cues or explicit decoding instruction) raised compliance by up to 95 percentage points within the same model and task (Graves, 26 Feb 2026).
Both studies highlight a major underexplored attack surface in prompt injection and adversarial prompting.
5. Detection, Mitigation, and Defense Strategies
Defense against invisible Unicode attacks is a multifaceted challenge due to the adaptive and covert nature of the channel.
Input Sanitization:
- Strip or normalize all VS, ZWSP, ZWNJ, and tag characters before model ingestion.
- Selective sanitization required to avoid inadvertent removal of legitimate codepoints in specific scripts or emoji.
Tokenizer-Level Filtering:
- Collapse or filter all invisible codepoints prior to tokenization so the LLM does not process adversarial payloads.
Tool-Use Guardrails:
- Monitor model-executed code for characteristic decoding patterns such as 6 or binary-to-ASCII loops, flagging or blocking suspicious subprocesses (Graves, 26 Feb 2026).
Training-Time Hardening:
- Augment LLM training data with invisible payload examples coupled with explicit "do not comply" supervision, to improve robustness against these vectors.
Perplexity/Entropy Filtering:
- Detect anomalously low-entropy, long sequences of invisible characters, potentially indicative of an attack (Gao et al., 6 Oct 2025).
Provider-Specific Probes:
- Model vulnerabilities are often encoding and provider specific. Defenders are advised to probe deployed models for both encoding schemes to calibrate protective filters (Graves, 26 Feb 2026).
Potential limitations of these defenses include the possibility of over-filtering (collateral removal of meaningful, non-malicious codepoints) and adversarial adaptation (attackers switching to new classes of invisible characters).
6. Broader Implications and Future Research
Invisible Unicode characters constitute a general class of adversarial channels—any Unicode-aware system that relies on plain-text safety validation is inherently susceptible (Gao et al., 6 Oct 2025). The capacity for adversaries to embed imperceptible, non-semantic payloads into otherwise benign prompts creates significant risks for integrity, safety, and reliability, not only in LLMs but in other applications including code review pipelines, moderation engines, and authentication systems.
Emerging research directions include the development of comprehensive Unicode normalization strategies, deeper analysis of non-rendering codepoints (e.g., directional overrides, joiners), and the integration of continual adversarial robustness evaluation into the LLM deployment lifecycle (Gao et al., 6 Oct 2025, Graves, 26 Feb 2026). A plausible implication is that unified, multi-layered defenses—combining input sanitization, tokenizer-level preprocessing, adaptive tool-use monitoring, and training-time adversarial data augmentation—will be essential to neutralize the covert instruction channel facilitated by invisible Unicode.