Non-Capturable Depiction in AI and Media

Updated 25 January 2026

Non-capturable depiction is a concept denoting media rendering methods that resist unauthorized machine replication while preserving narrative integrity.
It employs techniques like imperceptible perturbations, geometry cloaks, and concept erasure to disrupt generative AI, often reducing model performance by quantifiable margins.
Its applications span computer vision, digital storytelling, and robust recognition, addressing cross-depiction challenges and safeguarding sensitive content.

Non-capturable depiction refers to the intentional or inherent rendering of visual, narrative, or conceptual content in a manner that resists unauthorized machine replication, interpretation, or exploitation, while remaining perceptually valid and communicative to human observers. The term spans adversarial defenses against generative AI, accessibility affordances in media storytelling, robustness in computer vision, and theoretical frameworks for generative model safety. This concept has become central to a wide array of research seeking either to protect the integrity of authored media (e.g., grayscale artwork, copyrighted images, sensitive objects/styles) or to enable authentic representation of experiences impractical to record.

1. Conceptual Foundations and Formal Definitions

Non-capturable depiction is formally characterized as a transformation or depiction that, for a given experience or artifact $e$ , produces a media output $m$ such that $m$ cannot be effectively exploited, replicated, or recognized by current AI or CV systems, and/or renders aspects of $e$ that are impossible to capture via conventional means. In disability storytelling, Niu et al. define this as

$A_{nc} = \{ m \in M \mid \exists\, e \in E \setminus C : m = G(e) \}$

where $E$ is the space of lived experiences, $C$ the subset amenable to real-world capture, $M$ is the set of media artifacts, and $G$ denotes a generative-AI mapping (Niu et al., 18 Jan 2026). In computer vision, Cai et al. establish that artwork and abstract forms occupy feature distributions that are far outside the photographic manifold, with KL-divergence exceeding 0.4, thus causing classifiers trained on photographs to catastrophically fail on art (Cai et al., 2015).

In adversarial machine learning, non-capturable depiction is algorithmically realized via imperceptible perturbations ( $\delta$ , $\Delta$ ) or architectural edits, such that protected images become "uncapturable" for targeted generative or reconstruction systems (Nii et al., 10 Oct 2025, Song et al., 2024, Zhao et al., 2023, Carter, 29 May 2025).

2. Defensive Mechanisms and Algorithms

A central class of methods for non-capturable depiction involves embedding carefully crafted, imperceptible perturbations into visual media ("Uncolorable Examples," "geometry cloaks," "unlearnable examples"). These perturbations invalidate unauthorized generative outputs or block model learnability:

Uncolorable Examples (PAChroma):

Add $\delta$ to grayscale $x_l$ , forming $x_l + \delta$ , such that any colorization network $G$ yields an output $G(x_l+\delta)$ that remains gray, suppressing colorfulness by 80–90% (Nii et al., 10 Oct 2025).
$\delta$ is optimized under imperceptibility constraints ( $\ell_\infty$ bound, high PSNR/SSIM), Laplacian masking (concentrating effect on image texture and edges), and structure-invariant input ensemble (shifts, flips, JPEG, DCT) for transferability and robustness.

Geometry Cloaks (for TGS 3D Reconstruction):

Perturb $\Delta$ is computed such that $x' = x + \Delta$ forces the TGS pipeline $f(x')$ to reconstruct a user-specified watermark point-cloud pattern $M$ instead of faithful geometry (Song et al., 2024).
PGD optimization minimizes $L_{\text{CD}}(f(x+\Delta), M)$ with $\ell_\infty$ constraints, confirmed to survive postprocessing (compression, scaling).

Unlearnable Diffusion Perturbations (EUDP):

For each training image $x_i$ , find $\delta_i$ such that any diffusion model trained on $\tilde{x}_i = x_i + \delta_i$ cannot learn useful generative representations, i.e., generation FID increases by 70–80%, precision/recall drop by ~35% (Zhao et al., 2023).
An enhanced scheduler samples influential timesteps during the optimization of $\delta_i$ .

Concept Erasure in Diffusion (TRACE):

Edit cross-attention key/value matrices to project concept tokens $e_w$ onto neutral equivalents $e_u$ , then fine-tune LoRA adapters with trajectory-phase losses after semantic breaking point $t_0$ (Carter, 29 May 2025).
Yields near-perfect suppression of targeted concepts (e.g., style, faces, object classes, explicit content) with minimal side-effect on other prompts, confirmed by empirical benchmarks.

3. Impact on Digital Storytelling and Accessibility

Non-capturable depiction extends beyond adversarial defense, constituting a core GenAI affordance for representing experiences otherwise inaccessible or unsafe to record. Niu et al. ground the concept in digital storytelling theory (Niu et al., 18 Jan 2026), illustrating deployment in videos created by people with disabilities:

Enables rendering of denied-access moments, intense emotional or bodily cues, hazardous scenarios, or aspirational future events.
Achieved via multimodal, prompt-driven GenAI pipelines (LLM scripting, DALL·E visual generation, TTS narration), overcoming privacy, safety, or logistical constraints.
Identified limitations include breakdowns in rendering assistive technologies, scene/character continuity, emotional authenticity, and unwanted artifacts.

Design guidelines emphasize structured prompt templates, persistent character memory, flexible media formats, bias correction, and community-driven taxonomies to strengthen faithful non-capturable depiction in storytelling tools.

4. Evaluation, Measurement, and Robustness

Empirical evaluation of non-capturable depiction relies on both quantitative and qualitative metrics:

Application Domain	Perturbation Metric	Output Metric	Robustness Assessment
PAChroma (colorization)	$\\|\delta\\|_\infty$ ≤ 16/255	Colorfulness (CF drop 80–90%)	Maintains effect post JPEG, crop
Geometry Cloak (TGS)	$\\|\Delta\\|_\infty$ ≤ 2–8	Chamfer Dist. (CD ↑×50–70x)	Watermark survives compression
EUDP (diffusion)	$\\|\delta_i\\|_\infty$ ≤ 16/255	FID, Precision, Recall	Effect persists after postprocess
TRACE (concept erasure)	Norm of $\Delta W_K, \Delta W_V$	Acc_e, FID, StyleRem, Nudity	Low collateral damage on clean

Certifications of imperceptibility (PSNR >30 dB, SSIM >0.95) and cross-model transferability (CF drop 13–28% in black-box settings) are provided. Geometry cloak efficacy in watermark extraction and EUDP's impact on fine-tuned diffusion models (Monet, DreamBooth) are quantitatively confirmed.

5. Cross-Depiction Problem and Recognition Beyond Appearance

Non-capturable depiction fundamentally underlies the cross-depiction problem in CV, as explored by Cai et al. (Cai et al., 2015). Here, abstracted or stylized artwork ensures that appearance-based models trained on photographs suffer massive performance drop ( $\Delta_{P\to A}$ up to 40%). Robust recognition across non-capturable depictions arises only in part-based models encoding spatial relations (deformation models, multi-label graphs):

Spatial topology of object parts remains invariant across styles—even when low-level texture/statistics shift dramatically.
Empirically, Deformable Part Models (DPM) and Multi-label Graphs reduce cross-domain drop to 2–13%, while CNNs without spatial regularization suffer 23%.
This suggests resilient cross-depiction recognition will require fusion of deep representations with explicit part-relational and geometric priors.

Domain adaptation fails under large domain gap (KL-divergence $>0.4$ ) unless spatial structure is also realigned. Future architectures should focus on joint alignment of appearance and spatial configurations.

6. Extensions to Action Depiction and Temporal Context

Non-capturable depiction manifests in semantic and temporal axes in text-to-image systems, formalized in the AcT2I benchmark (Malaviya et al., 19 Sep 2025):

Action-centric prompts spanning rarity, emotional valence, spatial topology, and temporal extent reveal failure modes—especially for dynamic, rare, or contextually nuanced interactions (acceptance rate frequently <50% for leading T2I).
Knowledge distillation via LLM-based prompt enrichment (spatial, emotional, temporal attributes) improves acceptance rates by up to 72% for baseline to enriched prompts.
User-preference ablation confirms temporal cues are most critical for capturing "non-capturable" action semantics.
Automated metrics (CLIPScore, DinoScore) misalign with human judgments, highlighting the need for new, human-aligned evaluation protocols.

A plausible implication is that action depiction across non-capturable axes will require explicit embedding of relational, temporal, and affective knowledge, as well as hybrid human–machine evaluation.

7. Practical Guidelines and Future Directions

For deployment of non-capturable depiction:

Apply perturbation synthesis (PAChroma, Geometry Cloak, EUDP) before publishing sensitive images or datasets.
Employ trajectory-constrained concept erasure for generative model safety (TRACE).
Combine adversarial perturbation with watermarking for dual protection and verification.
Equip GenAI storytelling platforms with user-facing bias correction, scene memory, and structured prompts.
Advance recognition models by integrating spatial/geometric modules, especially in cross-depiction settings.

Open problems include generalizing defensive mechanisms to black-box exploiters, extending cloaks to multi-modal pipelines, and establishing scalable, human-aligned evaluation for semantic depiction. Non-capturable depiction, as formalized across these recent works, stands at the nexus of AI robustness, creative accessibility, and copyright-aware generative media.