Benign Visual & Textual Perturbations
- Benign visual and textual perturbations are small, meaning-preserving modifications that maintain human legibility yet reveal vulnerabilities in neural and vision-language models.
- They are generated using techniques like homoglyph replacements, diacritic insertions, and minor image transformations, validated with metrics such as legibility scores and model stability indicators.
- Research demonstrates significant performance drops on perturbed inputs, underscoring the need for robust, perturbation-aware defenses and improved model architectures.
Benign visual and textual perturbations are small, meaning-preserving modifications to inputs that are legible or recognizable to humans but can significantly alter the behavior or reliability of artificial neural models and vision-language systems. These perturbations are “benign” in the sense that human comprehension and semantic content remain intact, yet models often exhibit brittleness or vulnerability, resulting in degraded performance or erroneous predictions. Research across NLP, VQA, and LVM architectures has both formalized and experimentally exposed these effects, illuminating model weaknesses and motivating new paradigms for robustness, legibility quantification, and defense strategies.
1. Formal Definitions and Taxonomies
Benign perturbations are categorized primarily by modality and mechanism:
Visual Textual Perturbations: Defined over a discrete character space, these involve the replacement or augmentation of standard characters (ASCII or Unicode) with visually similar glyphs through homoglyphs, combining diacritics (e.g., Unicode U+0300–U+036F), or image-based neighbors. Let be a Unicode string and the set of diacritics. The set of perturbed variants is
where enumerates valid insertion positions (Boucher et al., 2023).
Legibility Score: For perturbed token , a latent function is introduced, with the legibility label
Higher implies higher legibility for human readers (Seth et al., 2023).
Benign Visual Perturbations for Images: For VQA and LVMs, benign visual perturbations include pixel shifts, padding/cropping, minor geometric transformations (rotation, scaling), text overlays, and compositional artifacts. Operators include horizontal cyclic shifts , zero-padding/cropping , scaling , and rotation , all of which maintain semantic invariance (Rosenfeld et al., 14 Nov 2025).
2. Methodologies for Generation and Measurement
Textual Variant Synthesis:
- Homoglyph attack: Replacing with visually similar set using ECES (easy), DCES (description), or ICES (image-embedding) spaces.
- Differential Evolution for Diacritic Attacks: Population-based black-box optimization maximizing drops in model confidence or output similarity. Offspring generation leverages mutation and crossover over insertion positions and diacritic marks; fitness measured by model task-specific objectives (e.g., label probability or sequence metric) (Boucher et al., 2023).
Legibility Evaluation:
- Binary F₁ and pairwise ranking accuracy via human annotation (LEGIT dataset: 29,630 classification, 13,690 pairwise examples) (Seth et al., 2023).
- Vision-based: Rendered glyph distance or OCR model embeddings (mean cosine distance, TrOCR-MT scoring).
Visual Perturbation in VLMs:
- Consistency/stability is measured via answer entropy . The per-sample stability indicator quantifies invariance under benign transformation (Rosenfeld et al., 14 Nov 2025).
3. Human Robustness and Model Fragility
Extensive experimental evidence demonstrates that humans maintain high comprehension under substantial benign modification, while models deteriorate rapidly:
| Condition (Type/Level) | Human Char Error | Model Perf. Drop |
|---|---|---|
| ECES homoglyphs, p=0.8 | 2.8% | Up to –82% |
| DCES, p=0.8 | 7% | –68 pp (F₁) |
| Diacritics, β=2 | 2–5% | <Random guessing |
| VLM visual shift, =8px | n/a | 7–9% answer flips |
| Text overlay (VQA) | n/a | 34%–92% instab. |
Humans reliably recover underlying words/sentences (≥90% accuracy at heavy corruption), whereas “defended” ViT/OCR/NLP models exhibit steep linear degradation as perturbation budget increases (e.g., up to –92.6% F₁, –63.7% accuracy) (Boucher et al., 2023, Eger et al., 2019).
4. Analytical Models, Benchmarks, and Impact
Benign perturbations are not adversarial in the classical sense; they preserve semantics yet exploit non-invariances and parametric biases.
Visual-LLMs and Consistency:
- Pixel shifts/padding induce activation non-smoothness and receptive field confusion in CNNs/VLMs.
- Paraphrase and translation rephrasings reveal strong sample-level instability—7–16% instance-level flipping for open-source models; >90% for text overlays in GPT-4o (Rosenfeld et al., 14 Nov 2025).
- Stability patterns serve as robust indicators of correctness—samples stable under both visual and textual perturbations are 5–12 percentage points more likely to be answered correctly.
Beneficial Visual Noise:
- The VAP (Visual Adversarial Perturbation) optimization leverages contrastive surrogate losses to increase factual grounding, discouraging model hallucination by penalizing answer similarity to noised images and “null” prompts (Zhang et al., 31 Jan 2025).
- VAP reduces hallucination metrics (POPE, CHAIR, BEAF) consistently across 8 models without model modification.
5. Defenses, Measurement, and Robust Design
Defenses against benign perturbations are necessary but non-trivial:
- Preprocessing: Strip combining marks, revert homoglyphs, or recover canonical character sets.
- Adversarial Training: Augment training with visually and textually perturbed data; use perturbation-aware objectives (jointly minimize losses on clean and modified samples).
- Visual Embeddings: Infuse models with visually-informed character representation to improve resilience, although scale and font invariance remain limitations.
- Rule-Based Recovery: For simple attacks (e.g., ECES homoglyphs), nearest-standard mapping via embedding brings near-complete robustness.
| Method | Relative Gain (e.g., POS, p=0.5 DCES) |
|---|---|
| Rule-based Rec. | +0.15 (up to 100% on ECES) |
| Adversarial Train | +0.12 |
| Visual Embedding | Slight, only helps character-level |
| AT + VE | +0.20 |
The shielding methods close much, but not all, of the performance gap induced by benign perturbations. Limitations include domain shift fragility and embedding sensitivity (Eger et al., 2019).
6. Future Directions and Open Challenges
Research trajectories include:
- Quantification of Legibility and Robustness: Expansion of datasets (e.g., LEGIT), multi-lingual and cross-script schemes, learned scoring functions calibrated to human comprehension.
- Invariant Model Architectures: Incorporation of group-equivariant layers (e.g., E(2)-CNNs), improved interpolation/backbone design to mitigate sampling sensitivity.
- Consistency Losses for Training: Use sample-level entropy and cross-modal answer stability during model optimization.
- Evaluation Protocols: Standardize consistency-based benchmarks alongside adversarial tests; use open-source model stability to monitor and anticipate closed-source model failures.
- Detection and Filtering: Development of detectors for visually/textually perturbed input prior to downstream processing (beyond naive hard-mapping).
- Extensions to Textual Perturbations: Contrastive “uncertainty” regularization for natural language inputs to reduce parametric bias in generative models (Zhang et al., 31 Jan 2025).
A plausible implication is that rigorous scrutiny of benign perturbations—including visually innocuous text attacks, pixel-level image modifications, and paraphrasing—will continue to expose foundational vulnerabilities in vision-language systems. Advances in invariance-driven modeling and robust data-centric training can enhance trustworthiness and reliability in practical deployment.