Papers
Topics
Authors
Recent
Search
2000 character limit reached

EvilEye: Adversarial Threats in Gaze & Camera Systems

Updated 1 February 2026
  • EvilEye is a class of adversarial attacks targeting gaze-based and camera perception systems by manipulating feature spaces and optical inputs.
  • It employs techniques like FGSM and expectation-over-transformation to generate imperceptible perturbations, drastically reducing classifier accuracy.
  • Dynamic, sensor-aware attack modes highlight the need for advanced adversarial training and hardware–software co-design to secure AR/VR and autonomous systems.

EvilEye refers to a class of adversarial threat models and physical-world attacks targeting perception systems, specifically those relying on gaze data (such as AR/VR headsets) and camera-based machine learning pipelines. EvilEye encompasses both the generation of imperceptible digital perturbations to eye-movement features that subvert classification, as well as dynamic, sensor-level physical perturbations delivered via transparent displays to induce misclassification under real-world operational conditions (Hagestedt et al., 2020, Han et al., 2023).

1. Formal Threat Models: Digital and Physical Channels

The term "EvilEye" is used to describe attacks where an adversary seeks to introduce small perturbations δ\delta—either added to gaze feature vectors xx or introduced optically to camera inputs—that fool classifiers f(θ,x)f(\theta, x) or deep models FF. For eye-based user modelling, the adversary targets the gaze-derived feature space, manipulating fixation durations, saccade amplitudes, and higher-order reading cues so as to steer predictions away from the true class label yy (untargeted) or toward a specified ytargety_\mathrm{target} (targeted). The attack aims for δ\|\delta\| to remain within the natural variation of gaze data, remaining undetectable to users and anomaly detectors (Hagestedt et al., 2020).

In camera-based autonomous systems, EvilEye is formalized as a man-in-the-middle attack using a transparent display mounted directly before the lens. The device projects computed RGB perturbations, modulated by auxiliary sensing (e.g., GPS or side camera), generating scene-specific adversarial composites that deceive FF across all environmental variations. The adversary models the optical transformation pipeline ff including intrinsic calibration, lens distortion κ\kappa, blending ratios α\alpha, and illuminance-dependent noise η(L)\eta(L) (Han et al., 2023).

2. Generation of Adversarial Perturbations

In gaze-based systems, EvilEye attacks leverage the Fast Gradient Sign Method (FGSM), adapted from vision to feature space. For a classifier loss J(θ,x,y)J(\theta, x, y), FGSM approximates the steepest direction to cross the decision boundary:

δ=ϵsign(xJ(θ,x,y))\delta = \epsilon \cdot \mathrm{sign}(\nabla_x J(\theta, x, y))

Iterative, minimal-FGSM seeks a perturbed xx' with f(θ,x)yf(\theta, x') \neq y or f(θ,x)=ytargetf(\theta, x') = y_\mathrm{target}, imposing Δx2max\|\Delta x\|_2 \leq \text{max}, with empirical bounds confirmed to fall inside ordinary inter-sample gaze variability (Hagestedt et al., 2020).

For physical EvilEye attacks, adversarial perturbation δ\delta is optimized in the display domain via expectation-over-transformation (EOT) steps:

minδS  ExDc,  τT[L(F(f(δ;τ)+x),ytrue)]+λR(δ)\min_{\delta \in \mathcal{S}} \; \mathbb{E}_{x \sim \mathcal{D}_c, \;\tau \sim \mathcal{T}}\Big[ \mathcal{L}(F(f(\delta;\tau)+x), y_\mathrm{true}) \Big] + \lambda R(\delta)

Here, random scene backgrounds, ambient light (L[30,60,000]L \sim [30, 60,000] lux), perspective, object pose, and scale are systematically sampled to ensure perturbations survive real-world noise. Physical constraints S\mathcal{S} restrict patterns to feasible semi-transparent dot arrangements compatible with the display (Han et al., 2023).

3. Experimental Methodologies and Evaluation Metrics

Eye-based EvilEye attacks utilize document-type classification in VR with a dataset comprising 20 participants reading comics, newspapers, and textbooks. Gaze at 30 Hz is mapped to 54 high-level features per 45 s window. White-box SVMs (RBF kernel, C=1C=1, γ=1/54\gamma=1/54) and black-box Random Forests (100 trees) are employed, with leave-one-subject-out validation (Hagestedt et al., 2020).

Physical EvilEye evaluations use the LISA-17 traffic sign dataset, fine-tuned ResNet-50, and detector models including Faster-R-CNN (FPN, ResNet-50 backbone), YOLOv3 (Darknet-53), and Google Vision API. Attack Success Rate (ASR)—the fraction of frames misclassified or undetected—is reported across varied illumination and scene backgrounds. Baselines include SLAP projector attacks, static sticker attacks, and standard adversarial patch methods (Han et al., 2023).

Experimental results illustrate the superiority of minimal FGSM in drastically degrading SVM accuracy (89%24%89\% \rightarrow 24\% untargeted), targeted attacks dropping to 08%0-8\% depending on class flip, and transfer to RF at 958%9-58\% accuracy, with perturbations statistically indistinguishable from natural gaze noise in most scenarios. Physical EvilEye attacks achieve >95%>95\% ASR in ResNet-50 and YOLOv3 models under >3,000>3,000 lux, contrasting baseline methods' failure beyond $120$ lux. Cross-model transferability is demonstrated, with ResNet-50-sourced perturbations yielding 96%96\% ASR in Faster-R-CNN and 92%92\% in YOLOv3 (Hagestedt et al., 2020, Han et al., 2023).

Attack Performance Table

Method/Condition White-box SVM Untargeted SVM Targeted RF Transfer (Black-box)
Comic→any 24% 3–8% 9–45%
Newspaper→any 24% 2–40% 23–58%
Textbook→any 24% 0–2% 0–51%
Illuminance (lux) ResNet-50 ASR YOLOv3 ASR
120 100% 97%
300 100% 94%
600 99% 90%
1500 98% 85%
3000 95% 80%

4. Attack Modes: White-Box, Black-Box, Targeted, Dynamic

White-box EvilEye attacks exploit full access to classifier gradients, whereas black-box modes estimate surrogates, yielding partial transferability. Targeted flips (forcing a specific output) show greater misclassification control than untargeted approaches. In physical settings, dynamic EvilEye harnesses runtime adaptability—using contextual sensors (GPS, side camera)—to trigger class-specific perturbations only for imminent targets, as opposed to static sticker or projector attacks. This dynamic capability raises per-object and per-sign ASR (90%\approx90\%) compared to static attacks (40%\approx40\%), and reduces ancillary perturbation exposure in non-target scenes (Han et al., 2023).

A plausible implication is that practical adversaries will prefer dynamic–contextual strategies for both efficacy and stealth.

5. Defense Strategies

Adversarial training—enriching classifier training sets with synthetic FGSM examples—achieves increased robustness without sacrificing utility (test accuracy unchanged, 88%90%88\%-90\%). Untargeted attack accuracy improves to 46%46\% with 10%10\% adversarial training, or 56%56\% at 50%50\% retrain. Attacks require larger, more detectable perturbations after retraining. However, the strongest attacks (e.g., newspaper→comic) may remain effective unless adversarial augmentation approaches >50%>50\% (Hagestedt et al., 2020).

Physical defenses for EvilEye include neutral-density and narrow-band filters (mitigating but not eliminating attacks up to $60,000$ lux), polarization layers, and multi-modal detection. Algorithmic approaches employ physical domain adversarial training (up to 30%30\% ASR reduction under high lux), input randomization (10%\approx10\% ASR drop), and feature-squeezing (limited efficacy below $300$ lux). Saliency-based detectors such as SentiNet exhibit high false negative rates (90%\approx90\%), as EvilEye perturbations are distributed and blend into scene noise (Han et al., 2023).

6. Limitations, Open Questions, and Practical Considerations

Current EvilEye results are constrained by dataset size and diversity; robust evaluation across broader tasks and subject pools (medical diagnostics, multitask activity recognition) remains an open challenge. Detection of abnormal 2\ell_2 distances in feature space and gradient-based anomaly signatures is an area for further research. Efficacy of defensive distillation and feature-squeezing techniques for non-image, high-level gaze features is untested.

Best practices include integrating fast, FGSM-style self-auditing (processing time 0.05\approx0.05 s/sample for gradients, 5\approx5 min retrain overhead). Defensive strategies must balance classifier utility with heightened perturbation-detectability, and ultimately combine robust architectures, real-time anomaly detectors, and hardware–software co-design to close EvilEye vulnerabilities (Hagestedt et al., 2020).

7. Contextual Impact and Future Directions

EvilEye illustrates profound security risks for perception-driven user modelling and autonomous platforms. In digital gaze-based systems, imperceptible perturbations can systematically bias inference about user activities, cognitive states, or health traits. Physical EvilEye attacks demonstrate adversarial robustness under extreme real-world noise, challenging existing defense paradigms and necessitating novel multimodal, physical training and filtering techniques. The dynamic, sensor-level nature of these attacks represents a pivotal development in adversarial perception research.

A plausible implication is that securing future AR/VR and autonomous pipeline architectures will require adversarial-awareness not just at the model layer but across the entire sensing stack. As high-fidelity, dynamic physical attacks become more accessible, routine adversarial training and physical countermeasures must be institutionalized to maintain trustworthy perception and user-modelling.


(Hagestedt et al., 2020, Han et al., 2023)

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to EvilEye.