Universal Litmus Patterns: Revealing Backdoor Attacks in CNNs (1906.10842v2)

Published 26 Jun 2019 in cs.CV

Abstract: The unprecedented success of deep neural networks in many applications has made these networks a prime target for adversarial exploitation. In this paper, we introduce a benchmark technique for detecting backdoor attacks (aka Trojan attacks) on deep convolutional neural networks (CNNs). We introduce the concept of Universal Litmus Patterns (ULPs), which enable one to reveal backdoor attacks by feeding these universal patterns to the network and analyzing the output (i.e., classifying the network as clean' orcorrupted'). This detection is fast because it requires only a few forward passes through a CNN. We demonstrate the effectiveness of ULPs for detecting backdoor attacks on thousands of networks with different architectures trained on four benchmark datasets, namely the German Traffic Sign Recognition Benchmark (GTSRB), MNIST, CIFAR10, and Tiny-ImageNet. The codes and train/test models for this paper can be found here https://umbcvision.github.io/Universal-Litmus-Patterns/.

Citations (220)

View on Semantic Scholar

Summary

The paper introduces Universal Litmus Patterns (ULPs) as a novel method to efficiently detect backdoor attacks in CNNs.
It leverages optimized input patterns, achieving near-perfect AUC scores on datasets like MNIST and CIFAR10 while generalizing across architectures.
The approach reduces computational cost dramatically compared to standard methods, enabling fast, practical deployment in real-world scenarios.

Universal Litmus Patterns: Revealing Backdoor Attacks in CNNs

The paper presents a novel approach to detecting backdoor attacks, also known as Trojan attacks, in convolutional neural networks (CNNs) through the use of Universal Litmus Patterns (ULPs). These patterns serve as an efficient detection mechanism, requiring only a few forward passes through a CNN to ascertain whether a model is clean or compromised. This methodology represents a valuable contribution to the field of adversarial machine learning, particularly in safeguarding the integrity of pre-trained models.

Methodology

Backdoor attacks aim to manipulate a model by embedding a trigger pattern in the training dataset, such that the model is activated to misapply when exposed to this trigger in the test phase. The paper introduces ULPs, which are optimized input patterns designed to expose this vulnerability. The ULPs are generated through a process that involves training a set of models comprising both compromised and untainted networks. The outputs of these models to ULPs are analyzed to determine the presence of a backdoor. By optimizing these ULPs to maximize their detection efficacy, ULPs can generalize across different architectures and unforeseen triggers, outperforming standard detection methods with significant computational efficiency.

Evaluation and Results

The authors evaluate the efficacy of ULPs on four standard datasets: the German Traffic Sign Recognition Benchmark (GTSRB), MNIST, CIFAR10, and Tiny-ImageNet. Across thousands of networks trained on these datasets, the detection performance using ULPs achieves an area under the ROC curve (AUC) close to 1 on both CIFAR10 and MNIST. On GTSRB (for ResNet18) and Tiny-ImageNet, the approach achieves AUCs of 0.96 and 0.94, respectively. These results underscore the robustness and precision of ULPs in identifying models compromised by backdoor attacks.

Comparative Analysis

ULPs set a new baseline for performance, surpassing existing methodologies like Neural Cleanse, which require significantly more computing resources due to their O(K²⁾ complexity for testing all input-output class-label combinations. The computational cost for ULPs is merely O(M) forward passes. This speed advantage — ULPs perform 20 milliseconds versus the baseline's extensive minutes for a single network evaluation — makes ULPs particularly appealing for practical deployment in real-world scenarios.

Implications and Future Directions

The introduction of ULPs heralds enhanced security measures for deployed deep learning models, particularly in environments where models are obtained from third parties. ULPs require no access to the training dataset and support the detection of backdoor triggers with minimal prior information, offering a potent tool across various application domains prone to adversary attacks.

On the theoretical side, these results indicate the potential for further research into the generalizability of ULPs across different network architectures and complex backdoor configurations. Practically, the exploration of adaptive defense mechanisms using ULPs presents a promising avenue for constructing more resilient models against increasingly sophisticated adversarial threats.

In conclusion, Universal Litmus Patterns represent a significant stride in the proactive detection of adversarial attacks, integral to the secure deployment of machine learning technologies across sensitive applications. Their efficacy, efficiency, and adaptability distinguish them as a key addition to the repertoire of security techniques available to researchers and practitioners in the field.

PDF Markdown

Related Papers

GitHub

Universal Litmus Patterns | Universal Litmus Patterns