Papers
Topics
Authors
Recent
2000 character limit reached

Universal Defensive Underpainting Patch

Updated 8 November 2025
  • The paper introduces UDUP, a universal background patch that robustly degrades OCR performance while keeping text visually clear for humans.
  • The methodology employs adversarial optimization with dual losses, targeting both detection confidence and mid-layer feature alignment to enhance transferability.
  • Empirical results demonstrate that UDUP scales across diverse image sizes and maintains robustness under common transformations such as scaling and JPEG compression.

A Universal Defensive Underpainting Patch (UDUP) is an adversarially optimized, content-agnostic, universal pattern designed to be applied in the background regions ("underpainting") of images, specifically to disrupt computer vision systems such as Optical Character Recognition (OCR) while maintaining visual integrity and readability for human viewers. UDUPs are characterized by their universality (same patch applies to arbitrary images), scalability (adaptable to images of any size via tiling), robustness (resilience to common transforms such as scaling and JPEG compression), and transferability (effectiveness across disparate detection models and commercial OCR engines). This paradigm offers substantial improvements over prior character-level or image-dependent defenses against unauthorized automated text extraction and other forms of adversarial exploitation.

1. Motivation and Adversarial Context

The development of UDUPs was driven by the need for robust, scalable, and visually unobtrusive defenses against framework-agnostic automated extraction of sensitive or valuable text via OCR systems. Traditional character-perturbing defenses required per-image optimization, did not generalize over arbitrary document geometries, and often degraded human readability. In screenshot-based piracy or large-scale document scraping settings, per-character or per-image perturbations proved logistically infeasible.

UDUPs exploit the observation that OCR systems depend not only on character shape, but also on low-level and mid-level features contained in the background. By targeting the underpainting—i.e., the non-character, non-salient regions which OCR models leverage for segmentation and layout analysis—rather than the characters themselves, a universal patch can be tiled for any image size with little to no disruption to readability, yet effectively degrade or confuse the OCR pipeline (Deng et al., 2023).

2. Technical Construction and Optimization

The UDUP construction is formulated as the search for a small, fixed-size patch p[1ϵ,1]s×s\mathbf{p}^* \in [1-\epsilon, 1]^{s \times s}, with ϵ\epsilon controlling the maximum allowed perturbation, and smin(hi,wi)s \ll \min(h_i, w_i) denoting the patch size (where hi,wih_i, w_i are the target image dimensions). The patch is applied via a tiling operator Ti(p)\mathcal{T}_i(\mathbf{p}^*) that repeats p\mathbf{p}^* non-overlappingly over the background mask Mi\mathbf{M}_i for each image xi\mathbf{x}_i.

The optimization objective is: p=argminpE(x,M)D[Lp(x,p)+λLm(x,p)],s.t.    1p<ϵ\mathbf{p}^* = \operatorname*{argmin}_{\mathbf{p}} \, \mathbb{E}_{(\mathbf{x}, \mathbf{M}) \sim \mathcal{D}} \left[ \mathcal{L}^p(\mathbf{x}, \mathbf{p}) + \lambda \mathcal{L}^m(\mathbf{x}, \mathbf{p}) \right], \quad \text{s.t.} \;\; \| \mathbf{1} - \mathbf{p} \|_\infty < \epsilon

  • Lp\mathcal{L}^p: primary loss to reduce detection confidence in true text regions.
  • Lm\mathcal{L}^m: multi-middle-layer feature alignment loss, enhancing transferability across diverse OCRs by matching internal activations to pure underpainting images.
  • λ\lambda: hyperparameter balancing the two losses.

The loss Lp\mathcal{L}^p is computed via a reference scene text detector (e.g., CRAFT), encouraging low probability in the detector's score map; Lm\mathcal{L}^m averages the mid-layer differences between the underpainted image (with text) and pure underpainting (without text).

Training involves projected gradient descent with random scaling for both the input and the post-fusion image at each iteration, enhancing robustness to scale transformations and mimicking real screenshot conditions. The patch parameter p\mathbf{p} is clipped to the allowed region [1ϵ,1][1-\epsilon, 1], and momentum is used to stabilize convergence.

3. Deployment and Application

Upon deployment, the learned p\mathbf{p}^* is seamlessly tiled across the image background using the modulus operator to ensure periodicity: ui[m,n]=p[mmods,nmods]\mathbf{u}_i[m, n] = \mathbf{p}^*[m \bmod s, n \bmod s] for all background pixels (m,n)(m, n) in image xi\mathbf{x}_i, as determined by the mask Mi\mathbf{M}_i. Text characters remain unaltered, preserving visual quality and legibility.

This tiling-based application confers two crucial properties:

  • Scalability: The same patch instance can cover documents, web pages, or arbitrary screenshot regions of varying sizes.
  • Content agnosticism: No knowledge of specific textual content, language, font, or layout is needed.

The UDUP remains robust under common post-processing: standard web or office workflows involving scaling (zooming, window resizes) and JPEG compression, due to random rescaling incorporated during training.

4. Empirical Performance and Robustness

Extensive evaluation on a 596-image real-world dataset (web pages, documents, varied languages and layouts) demonstrates that a 30×3030 \times 30 patch with mean underpainting intensity (MUI) 0.09\approx 0.09 reduces CRAFT OCR recall to 3.9%3.9\% of baseline (Deng et al., 2023). Precision is also reduced (although more substantially for black-box systems), further degrading OCR reliability.

Robustness against scaling was validated for zoom ranges 60200%60{-}200\%, with recall maintained below 29%29\% of baseline even for strong zoom-out; JPEG compression at low quality (Q=50) still suppresses recall to 21%21\%. The defense is transferable: black-box attack against EasyOCR, PAN++, PSENet, and two commercial Chinese OCRs all exhibited recall drops (precision often degraded to the point of producing unusable noise), even though these were not used in UDUP optimization.

Visual quality was preserved across a wide MUI range: at MUI =0.06= 0.06, the underpainting is imperceptible for most readers, while increasing up to $0.12$ for greater protection may yield mild artifacts. The patch remains effective for arbitrary screenshot windows, complex latex/image backgrounds, and multilingual content.

Property Prior Defenses UDUP
Patch Application Per-image/per-character Universal, tiled per-image
What is Modified Character pixels (salient) Background only (underpainting)
Model Adaptivity None (requires per-case tuning) Agnostic/plug-and-play
Visual Quality Often degraded High (characters preserved)
Robustness (scaling/compression) Unaddressed or poor Explicitly robust

Unlike character-perturbing methods, which risk rendering text unreadable or necessitate new perturbations for every new image or window, UDUP provides a fixed, class-agnostic, and scalable defense. Compared to universal frame or border-based approaches, UDUP uniquely leverages the periodicity and texture of the background to evade model recognition, rather than interfering with content bounding boxes or inserting visually salient artifacts.

6. Limitations and Open Issues

UDUP is less effective against certain OCR architectures (e.g., DBNet) in recall metrics, though these usually suffer a severe drop in precision, producing high false positives and practically unusable output for attackers. Aggressive settings for MUI or patch size may introduce perceptible patterning, which in rare cases could distract users.

The technique's transferability to extreme scaling (<60%) may erode, though in practice, OCR (and human reading) of such images is diminished as well. The defense against adaptive attackers who retrain or fine-tune OCRs to sidestep underpainting perturbations is not directly addressed in the original publication. Furthermore, UDUP assumes access to an accurate background mask; errors in masking may degrade visual quality or defense.

A plausible implication is that integrating UDUP with additional, perhaps randomized, defense strategies (such as dynamic underpainting or feature-space perturbations during PDF/html rendering) could further harden against future surmountable adaptive attacks.

7. Implementation and Practical Considerations

UDUP training requires only standard deep learning infrastructure (GPU-accelerated projected gradient descent) and one or more scene text detector models (e.g., CRAFT) for white-box loss computation. The Python-esque pseudocode in the original paper provides a minimal realization; deployed patches and inference code are available at [https://github.com/QRICKDD/UDUP]. Deployment consists solely of background tiling, making it readily adaptable to web back-ends, document renderers, or prepress pipelines.

For practical integration:

  • Choose patch size ss small relative to typical font sizes to balance camouflage and effectiveness.
  • Tune ϵ\epsilon and MUI to trade off visual impact and strength of defense.
  • Avoid masking character regions by accurately estimating Mi\mathbf{M}_i, preserving text clarity.

The plug-and-play nature, scalability, and black-box transferability of UDUP align with the operational needs of large-scale content publishers seeking resilience against both "gray box" and "black box" automated text extraction threats.


In summary, the Universal Defensive Underpainting Patch constitutes a content-agnostic, highly adaptable, and visually inconspicuous adversarial defense against OCR, setting a new standard for universal, scalable text anti-piracy while addressing numerous limitations of prior approaches (Deng et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Universal Defensive Underpainting Patch (UDUP).