UnMask: Adversarial Detection and Defense Through Robust Feature Alignment (2002.09576v2)

Published 21 Feb 2020 in cs.CV, cs.CR, and cs.LG

Abstract: Deep learning models are being integrated into a wide range of high-impact, security-critical systems, from self-driving cars to medical diagnosis. However, recent research has demonstrated that many of these deep learning architectures are vulnerable to adversarial attacks--highlighting the vital need for defensive techniques to detect and mitigate these attacks before they occur. To combat these adversarial attacks, we developed UnMask, an adversarial detection and defense framework based on robust feature alignment. The core idea behind UnMask is to protect these models by verifying that an image's predicted class ("bird") contains the expected robust features (e.g., beak, wings, eyes). For example, if an image is classified as "bird", but the extracted features are wheel, saddle and frame, the model may be under attack. UnMask detects such attacks and defends the model by rectifying the misclassification, re-classifying the image based on its robust features. Our extensive evaluation shows that UnMask (1) detects up to 96.75% of attacks, and (2) defends the model by correctly classifying up to 93% of adversarial images produced by the current strongest attack, Projected Gradient Descent, in the gray-box setting. UnMask provides significantly better protection than adversarial training across 8 attack vectors, averaging 31.18% higher accuracy. We open source the code repository and data with this paper: https://github.com/safreita1/unmask.

Citations (21)

View on Semantic Scholar

Summary

The paper demonstrates a novel method using robust feature alignment that detects up to 96.75% of adversarial attacks on deep neural networks.
It achieves a correct classification rate of 93% for adversarial images in gray-box settings, outperforming adversarial training by 31.18% on average.
UnMask’s architecture-agnostic design offers a practical defense solution adaptable to various deep learning models in security-critical applications.

UnMask: Advancements in Adversarial Detection and Defense

The vulnerability of deep neural networks (DNNs) to adversarial attacks has been a pressing concern within the field of AI, necessitating resilient methods to safeguard these models. The paper "UnMask: Adversarial Detection and Defense Through Robust Feature Alignment" introduces a novel framework known as UnMask, designed to address these vulnerabilities via robust feature alignment. The authors present a comprehensive evaluation demonstrating the framework's notable improvement in detecting and defending against adversarial attacks compared to existing methods such as adversarial training.

UnMask addresses the challenge of adversarial perturbations by leveraging the concept of robust features that are aligned with human intuition. The mechanism involves extracting these features from an input image and comparing them against the expected features of the predicted class. This allows UnMask to identify misalignment, suggesting the presence of an adversarial attack. The framework further rectifies misclassifications by utilizing these robust features to reclassify the adversarial images accurately.

Empirical evaluations show the efficacy of UnMask against various strong adversarial attacks, specifically Projected Gradient Descent (PGD) and MI-FGSM, across multiple attack strengths and conditions. Notably, UnMask demonstrated the capability to detect up to 96.75% of adversarial attacks while maintaining a low false positive rate of 9.66%. It also achieved a correct classification rate of up to 93% for adversarial images in gray-box settings, thereby outperforming adversarial training by an average accuracy improvement of 31.18% across different attack vectors.

The authors attribute the success of UnMask primarily to its strategy of robust feature alignment, offering an innovative and practical methodology for explaining and mitigating adversarial perturbations. This approach capitalizes on the difficulty for adversaries to manipulate all underlying robust features simultaneously when misleading the classification model. The framework is architecture-agnostic, allowing it to work in conjunction with various deep learning models without significant modifications to the original network structures.

From a theoretical perspective, the paper extends the understanding of adversarial vulnerability, relating it to a model's reliance on non-robust, easily exploitable features. UnMask counters this by integrating robust features, categorized as $\gamma$ -robust, into the model's defense mechanism. These features, unlike non-robust ones, maintain their utility even when adversarial perturbations are present, further contributing to heightened resilience against attacks.

Practically, UnMask's contribution is significant for applications where model security is paramount, such as autonomous systems or critical diagnostic tools. The method's ability to protect models without extensive retraining or access to proprietary network architectures ensures its viability across diverse fields.

In future research, the integration of UnMask with existing defensive frameworks could be explored for compounded security benefits. Additionally, its application can be broadened to address not only image-based attacks but also adversarial challenges in other domains such as natural language processing and sequential decision-making tasks.

UnMask signifies a progressive step towards fortifying neural networks against adversarial threats by equipping them with the means to perceive images closer to human perception, balancing the intricate trade-off between robustness and explainability in AI defense mechanisms.

PDF Markdown

Related Papers

GitHub

GitHub - safreita1/unmask: Adversarial detection and defense for deep learning systems using robust feature alignment (14 stars)

YouTube

Show All Videos