From Pixels to Reality: Physical-Digital Patch Attacks on Real-World Camera

Published 30 Mar 2026 in cs.CV | (2603.28425v1)

Abstract: This demonstration presents Digital-Physical Adversarial Attacks (DiPA), a new class of practical adversarial attacks against pervasive camera-based authentication systems, where an attacker displays an adversarial patch directly on a smartphone screen instead of relying on printed artifacts. This digital-only physical presentation enables rapid deployment, removes the need for total-variation regularization, and improves patch transferability in black-box conditions. DiPA leverages an ensemble of state-of-the-art face-recognition models (ArcFace, MagFace, CosFace) to enhance transfer across unseen commercial systems. Our interactive demo shows a real-time dodging attack against a deployed face-recognition camera, preventing authorized users from being recognized while participants dynamically adjust patch patterns and observe immediate effects on the sensing pipeline. We further demonstrate DiPA's superiority over existing physical attacks in terms of success rate, feature-space distortion, and reductions in detection confidence, highlighting critical vulnerabilities at the intersection of mobile devices, pervasive vision, and sensor-driven authentication infrastructures.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper introduces DiPA, a novel digital-physical adversarial attack that leverages dynamic smartphone screen displays to fool face recognition cameras.
It employs an optimization strategy without total variation regularization, enhancing attack transferability while maintaining face detection.
Experimental results reveal high attack success rates over traditional methods, highlighting critical vulnerabilities in real-world authentication systems.

Digital-Physical Adversarial Patch Attacks on Real-World Face Recognition Cameras

Introduction and Problem Context

The proliferation of vision-based authentication systems in pervasive computing creates a critical attack surface. Many face recognition (FR) solutions, both embedded and cloud-based, underpin the access control for edge and IoT devices in smart environments. These systems, however, remain vulnerable to adversarial manipulations that can evade or disrupt authentication. While digital attacks have been extensively studied, physical attacks—especially in unconstrained, practical settings—are more consequential due to their ability to circumvent software-based defenses by exploiting the sensing pipeline, which includes pre-processing artifacts, physical-world noise, and unknown transformations.

Traditional physical attacks have predominantly relied on printed patches or adversarial wearables that require extensive calibration and fail to scale to real-world dynamic settings. Most crucially, existing works do not consider the ubiquity of smartphones as attack vectors in real deployment scenarios.

DiPA: Digital-Physical Adversarial Attacks with Smartphones

This work introduces DiPA, a new class of physical adversarial attack that leverages an everyday smartphone screen as the attack surface. The attacker displays high-resolution adversarial patches dynamically rendered on their smartphone and presents it to the camera. This modality offers several advantages over prior physical attacks: rapid deployment, no color calibration or print artifacts, high brightness, and unconstrained variability.

Figure 1: An overview of the DiPA workflow, demonstrating the end-to-end threat scenario and digital-physical patch generation pipeline.

The proposed system consists of a cloud-based service where users upload their photo and receive a set of adversarial patches tailored for both strict black-box commercial FR cameras and white-box online models. The dynamic rendering capabilities of digital screens eliminate the need for total variation (TV) regularization, enhancing the expressive capacity and transferability of the patch.

Attack Formulation and Optimization Strategies

The attack objective is to reliably cause dodging: authorized users are no longer recognized by the camera, but the face itself remains detectable (i.e., the FR pipeline detects a face yet misidentifies it). This property is significant for pervasive deployments where face detection and identity recognition are decoupled.

The attack pipeline utilizes an ensemble of strong open-source FR models—ArcFace, MagFace, and CosFace—as surrogates. High-resolution adversarial patches are generated by optimizing the cosine similarity in the embedding space between patched and reference samples, employing a median pooling mechanism to bridge model input resolution and physical deployment requirements.

Two variants are considered:

DiPA+TV: Incorporates TV regularization for spatial smoothness.
DiPA: Regularization is omitted, justified by the fidelity and brightness characteristics of smartphone screens over printed surfaces.

Experimental Characterization

Comprehensive evaluation is conducted with five subjects using both online APIs (Face++, MobileFaceNet) and a closed commercial camera system in a strict black-box regime. For each method and individual, multiple patches are generated and tested over several trials, with performance assessed on:

Embedding space cosine similarity (lower is better)
Physical camera attack success rate (ASR; higher is better)
Maintenance of face presence confidence (to avoid degrading detection)

Results indicate that DiPA and DiPA+TV achieve superior ASR and greater reduction in identity similarity compared to Sibling-Attack, FaceOff, and AdvMask, while maintaining high face detection confidence. The dispensed digital-only variant (DiPA) exhibits higher expressive capacity and real-world transfer than TV-regularized or printed methods.

Demonstration and Real-World Impact

A web-accessible, hands-on demonstration platform is realized. Attendees generate and physically deploy DiPA patches in situ, confirming large-scale real-time failures of access control systems with no disruption of face detection. The live demo evidences the acute vulnerability of smart cameras to opportunistic attacks facilitated by everyday mobile devices.

Figure 2: A DiPA attack conducted in a real scenario, visually confirming that only the digital adversarial pattern delivers the dodging effect without impeding face detection.

The analysis highlights that non-adversarial screen content (solid colors, random patterns) is consistently benign, validating the specificity and robustness of the adversarial mechanism.

Theoretical and Practical Implications

The DiPA modality drastically lowers the barrier for physical adversarial attacks against sensor-driven FR systems. Critically, the attack pipeline does not require iterative query access or any knowledge of proprietary system internals, exploiting transferability from surrogate models even under strict black-box deployment. Omission of spatial regularization requirements challenges prevailing assumptions in physical adversarial literature, as high-resolution digital displays obviate the need for smoothness constraints aiming to counteract printing noise.

The findings call into question the resilience of current FR deployments in environments where user-held digital devices are ubiquitous. Defenses developed within digital simulation boundaries are insufficiently robust in such contexts.

Future Directions

Key opportunities for advancing both attack and defense research in this space include:

Developing adaptive and context-aware sensor-side defenses, such as real-time anomaly detection for mobile–camera interactions.
Evolving FR pipelines that factor in adversarial physical-domain perturbations introduced by ubiquitous devices.
Exploring generalization of DiPA-like attacks to other sensor modalities and multi-modal authentication setups.

Conclusion

This work demonstrates an operational and overlooked attack surface on commercial face authentication systems through digital-physical patch attacks rendered via smartphones. By validating high transferability and efficacy in strict black-box settings, it exposes critical gaps in prevailing defenses and underscores the necessity for new mitigation strategies tailored to pervasive, interaction-rich sensing ecosystems (2603.28425).

Markdown Report Issue