Papers
Topics
Authors
Recent
2000 character limit reached

Dynamic Indoor Lighting Attack (DILA)

Updated 24 November 2025
  • Dynamic Indoor Lighting-based Attacks are strategies that use controlled, time-varying indoor illumination to produce imperceptible perturbations in machine vision systems.
  • They employ techniques such as high-frequency LED modulation, discrete light switching, and physics-based temporal relighting to compromise face detectors, navigation agents, and vision-language models.
  • Empirical results demonstrate near-perfect attack success in face recognition and significant performance degradation in navigation and video classification under these dynamic lighting conditions.

Dynamic Indoor Lighting-based Attack (DILA) refers to a class of adversarial strategies that exploit time-varying manipulation of indoor illumination to disrupt, degrade, or extract information from machine vision systems, including deep neural classifiers, face recognition pipelines, vision-LLMs (VLMs), and navigation agents. Unlike static physical attacks (e.g., stickers, single-frame illumination), DILAs harness temporal variability and physical light modulation to create dynamic, often imperceptible perturbations that exploit both the photometric properties of indoor environments and the temporal response of camera sensors and models. This approach enables a broad attack surface, encompassing both adversarial misclassification and covert-channel data exfiltration, and is applicable in both black-box and white-box threat models.

1. Conceptual Taxonomy and Definition

Dynamic Indoor Lighting-based Attacks represent an evolution in adversarial machine vision research, situated at the intersection of physical adversarial robustness and real-world attack feasibility. The DILA paradigm encompasses methods where the lighting within an indoor scene is varied in a controlled or algorithmically optimized manner over time to yield a sequence of physically plausible, time-dependent perturbations. Key subclasses include:

  • High-frequency temporal illumination modulation: Exploiting imperceptible rapid shifts in LED intensity that are invisible to humans but detectable by rolling-shutter CMOS sensors, resulting in adversarial fringe artifacts.
  • Discrete global lighting schedule manipulation: Alternating the on/off state (or intensity) of room illumination sources at critical decision points, disrupting embodied vision tasks such as navigation.
  • Adversarial relighting with parameterized multi-point illumination: Generating continuous, temporally smooth adversarial lighting trajectories via rendering engines, used to degrade VLM or classifier performance across video frames.

This taxonomy distinguishes DILA from traditional static sticker or projection attacks as well as from single-frame global illumination transformations, highlighting their reliance on temporal, physical, and sometimes device-specific (e.g., sensor exploitation) properties (Fang et al., 2023, Li et al., 17 Nov 2025, Liu et al., 10 Mar 2025).

2. Physical and Algorithmic Foundations

The mechanisms underlying DILAs leverage the complex interaction between scene lighting, camera sensor response, and neural network feature extraction:

  • LED Modulation with Rolling Shutter Exploitation: In DoS and dodging attacks on face recognition, standard LED fixtures are modulated with high-frequency waveforms L(t)=L0+ΔLsin(2πft+ϕ)L(t) = L_0 + \Delta L \sin(2\pi f t + \phi), with f>500Hzf > 500\,\text{Hz} to evade human perception. The rolling-shutter effect imprints a spatially varying fringe pattern R(i)R(i) on the image rows, where for each pixel I(i,j)=X(i,j)R(i)I(i,j) = X(i,j) \cdot R(i). Attack parameter selection (ΔL,f,ϕ\Delta L, f, \phi) is achieved via grid search or greedy heuristics to minimize perceptual change subject to attack success (e.g., face non-detection or feature embedding collapse below threshold) (Fang et al., 2023).
  • Discrete Lighting Switch Scheduling: For embodied agents (VLN), the lighting state s(t){0,1}s(t) \in \{0,1\} at time tt determines the global scene intensity lt=s(t)ll_t = s(t) l^\star. A heuristic one-step lookahead flips s(t)s(t) exactly when the projected path deviation from the goal increases, maximizing a cumulative distance-to-goal loss L=t=1T^wtdtL = \sum_{t=1}^{\hat T} w_t d_t, where wt=t/T^w_t = t/\hat T. Greedy scheduling maximizes the non-differentiable attack objective in a black-box model setting (Li et al., 17 Nov 2025).
  • Physics-based Temporal Relighting: For VLMs, each frame’s image is re-rendered as X(t)=I(X(t),Λ(t))X^{\prime (t)} = \mathcal{I}(X^{(t)}, \boldsymbol{\Lambda}^{(t)}), with Λ(t)\boldsymbol{\Lambda}^{(t)} the set of point light parameters. The adversarial objective jointly optimizes over time for (i) maximal model (CLIP/VLM) loss; (ii) perceptual similarity (LPIPS); (iii) diversity of light sources; and (iv) temporal smoothness. Optimization proceeds via black-box methods such as CMA-ES over the concatenated temporal parameter vector (Liu et al., 10 Mar 2025).

3. Threat Models, Attack Workflows, and Formal Descriptions

DILA scenarios assume varying attacker capabilities and system architectures:

Scenario Control Lever Target Threat Model
LED-based DILA LED intensity, phase Face detectors Physical, black-box
Room lighting DILA Light on/off at steps VLN agents Black-box trajectory
Multilight DILA Param. point lights VLMs, CLIP Black-box, per-frame
  1. Pre-calibration: Estimate sensor rolling-shutter/exposure.
  2. Attack parameter sweep: Iterate over (f,D,ϕf, D, \phi) for LED modulation.
  3. Test: For each candidate:
    • Program LED, capture frame.
    • Face detection/verification in black-box fashion.
    • Retain parameters if attack success (e.g., detection failure or evasion).
  1. At each agent step tt:
    • Assess current and flipped lighting observations It(lt),It(l~t)I_t(l_t), I_t(\tilde l_t).
    • Query agent policy for both settings: get next actions.
    • Simulate one-step transitions, compute goal deviation.
    • Flip light if deviation is increased by the switch.
  1. Initialize: Gaussian prior for all per-frame light parameters.
  2. Iterate:
    • Sample candidate lighting sequences.
    • Render adversarial video frames.
    • Evaluate aggregate loss (adversarial, perceptual, diversity, temporal).
    • Update distribution (CMA-ES).

4. Empirical Evaluation and Results

Quantitative effectiveness of DILAs has been systematically documented:

  • Face Detection/Verification: With LED-based dynamic modulation, DoS success rates on Dlib, MTCNN, and RetinaFace reach 97.7–100% across pulse periods, and dodging success on FaceNet/ArcFace is 100% at 800μ\mus pulses. Attack remains effective across realistic unlock distances (18–28 cm). Modulation amplitudes ΔL/L0<5%\Delta L / L_0 < 5\% retain human imperceptibility (Fang et al., 2023).
  • VLN Embodied Navigation: On the CHORES benchmark (e.g., ObjectNav task, SPOC agent), DILA in conjunction with static lighting attack (SILA) yields an attack success rate (ASR) of 96.23%, compared to 0% for no attack, 23.23% for random intensity changes, and 60.38% for SILA alone. The episode length under attack increases from ~115 (clean) to ~234 steps, indicating pronounced degradation of navigation efficiency (Li et al., 17 Nov 2025).
  • VLM/CLIP Video Classification and Captioning: DILA reduces OpenCLIP ViT-B/16 top-1 accuracy from 97% (clean) to 44% (DILA best sequence). Captioning consistency (LLaVA-1.6) falls by 12 percentage points. Temporal smoothness constraints ensure the absence of visible flicker, preserving video naturalness (perceptual score >2.1/4) (Liu et al., 10 Mar 2025).
  • Information Leakage/Exfiltration: Dynamic modulation of smart lighting enables audio/video inference (mean-rank ≤2 for 120s observation in audio; ≤1.2 for 360s in video) and IR-based covert channel exfiltration up to 16 bps with BER ~0.03 at 5 m. Performance degrades with distance, ambient light, and physical obstructions (Maiti et al., 2018).

5. Comparative Analysis and Limitations

DILA exhibits several unique strengths and constraints relative to other physical adversarial attacks:

Advantages:

  • High stealth, leveraging imperceptible modulation (>500 Hz) or natural light transitions.
  • Black-box compatibility: No access requirements to model internals or gradients.
  • Physical plausibility assured through direct control of real-world light sources and temporal smoothness.

Limitations:

  • Effectiveness contingent on environmental factors (e.g., sunlight may attenuate LED/IR effects).
  • Dependence on hardware: rolling-shutter vulnerability does not extend to global-shutter CMOS.
  • High-dimensional search (e.g., full-frame relighting) can be computationally intensive and may not scale to long sequences or high light-source count.
  • Defensive measures (e.g., temporal filtering, global-shutter sensors, and robust visual encoders) can reduce attack impact, though not eliminate it in all cases (Fang et al., 2023, Li et al., 17 Nov 2025, Maiti et al., 2018).

6. Practical Countermeasures and Future Directions

Research on DILA has spurred exploration of both algorithmic and hardware-level defenses:

  • Temporal Filtering: Applying low-pass filters along image rows suppresses high-frequency flicker artifacts in images captured under adversarial LED modulation, though wide fringes evade such filtering.
  • Domain Randomization: Training agents and models with randomized illumination schedules increases robustness to real-world lighting variation and DILA-like manipulation (Li et al., 17 Nov 2025).
  • Hardware Mitigations: Employing global-shutter sensors or randomized-readout CMOS designs eliminates the spatial encoding of temporal lighting modulations (Fang et al., 2023).
  • Authorization and Dithering: Device-level controls on smart bulbs prevent remote or unauthorized light manipulation; introducing random signal noise complicates attacker inference (Maiti et al., 2018).

A plausible implication is that DILA frameworks will continue to broaden their scope, integrating finer-grained light source control, sensor-aware feedback, and real-time adaptation to environmental conditions. Inference and exfiltration attacks leveraging smart lighting underline the necessity for practitioners to treat scene illumination not as a benign environmental variable, but as an active adversarial channel.

7. Synthesis Across Research Domains

DILA constitutes a convergence point between physical adversarial machine learning, signal processing, and sensor-aware computer vision security. The concept bridges several communities:

Together, these works underscore the growing consensus that illumination—dynamic, contextual, and physically computable—must be robustly considered in the adversarial threat landscape of machine vision.


References:

  • "Imperceptible Physical Attack against Face Recognition Systems via LED Illumination Modulation" (Fang et al., 2023)
  • "Shedding Light on VLN Robustness: A Black-box Framework for Indoor Lighting-based Adversarial Attack" (Li et al., 17 Nov 2025)
  • "When Lighting Deceives: Exposing Vision-LLMs' Illumination Vulnerability Through Illumination Transformation Attack" (Liu et al., 10 Mar 2025)
  • "Light Ears: Information Leakage via Smart Lights" (Maiti et al., 2018)
  • "Adversarial Color Projection: A Projector-based Physical Attack to DNNs" (Hu et al., 2022)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Dynamic Indoor Lighting-based Attack (DILA).