Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models (1712.04248v2)

Published 12 Dec 2017 in stat.ML, cs.CR, cs.CV, cs.LG, and cs.NE

Abstract: Many machine learning algorithms are vulnerable to almost imperceptible perturbations of their inputs. So far it was unclear how much risk adversarial perturbations carry for the safety of real-world machine learning applications because most methods used to generate such perturbations rely either on detailed model information (gradient-based attacks) or on confidence scores such as class probabilities (score-based attacks), neither of which are available in most real-world scenarios. In many such cases one currently needs to retreat to transfer-based attacks which rely on cumbersome substitute models, need access to the training data and can be defended against. Here we emphasise the importance of attacks which solely rely on the final model decision. Such decision-based attacks are (1) applicable to real-world black-box models such as autonomous cars, (2) need less knowledge and are easier to apply than transfer-based attacks and (3) are more robust to simple defences than gradient- or score-based attacks. Previous attacks in this category were limited to simple models or simple datasets. Here we introduce the Boundary Attack, a decision-based attack that starts from a large adversarial perturbation and then seeks to reduce the perturbation while staying adversarial. The attack is conceptually simple, requires close to no hyperparameter tuning, does not rely on substitute models and is competitive with the best gradient-based attacks in standard computer vision tasks like ImageNet. We apply the attack on two black-box algorithms from Clarifai.com. The Boundary Attack in particular and the class of decision-based attacks in general open new avenues to study the robustness of machine learning models and raise new questions regarding the safety of deployed machine learning systems. An implementation of the attack is available as part of Foolbox at https://github.com/bethgelab/foolbox .

Citations (1,273)

View on Semantic Scholar

Summary

The paper introduces the Boundary Attack, a novel decision-based attack that creates minimal adversarial perturbations without relying on model gradients.
It utilizes a rejection sampling algorithm with dynamic step size adjustment to efficiently traverse the decision boundary in black-box settings.
Experimental results on MNIST, CIFAR-10, and ImageNet demonstrate performance comparable to gradient-based methods with enhanced applicability to real-world systems.

An Overview of Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models

The paper authored by Wieland Brendel, Jonas Rauber, and Matthias Bethge systematically addresses the vulnerabilities of machine learning models to adversarial attacks, specifically in real-world black-box scenarios where internal model details and confidence scores are inaccessible. The authors introduce the concept of decision-based attacks and propose a novel method called the Boundary Attack as a solution that overcomes these limitations.

Context and Motivation

Numerous high-performance machine learning algorithms, especially in domains like computer vision and speech recognition, are susceptible to adversarial perturbations—small, often imperceptible modifications of the input that can lead to misclassification. Previous adversarial attacks generally fall into three categories: gradient-based, score-based, and transfer-based attacks. However, these traditional methods suffer from significant limitations in real-world applications due to their reliance on detailed model information or the need for cumbersome substitute models. The Boundary Attack aims to mitigate these limitations by operating solely on the final decision of the model, making it more practical for black-box scenarios.

Boundary Attack

The Boundary Attack is a decision-based adversarial attack that initializes from a point already classified as adversarial and then performs a random walk along the decision boundary to minimize the perturbation while keeping the input adversarial. The key attributes of this attack are:

Conceptual Simplicity: The attack uses a rejection sampling algorithm with a straightforward proposal distribution.
Minimal Hyperparameter Tuning: The attack requires very little hyperparameter adjustment, dynamically tuning only step sizes.
Flexibility: Unlike gradient-based attacks that depend on model-specific gradients, the Boundary Attack can be applied universally to different machine learning models with various adversarial criteria.
Compatibility with Real-World Scenarios: The attack is effective against practical systems such as autonomous cars and cloud-based APIs where internal model details are not disclosed.

Methodology

The Boundary Attack comprises several stages:

Initialization: The attack starts from an adversarial sample drawn from a maximum entropy distribution.
Proposal Distribution: In each iteration, a perturbation is drawn from a Gaussian distribution, then adjusted to ensure it progressively reduces the distance towards the original input while remaining within the adversarial region.
Adversarial Criterion: The criterion for adversarial samples is flexible and can be adapted to different use cases, including targeted and untargeted misclassification.
Dynamic Step Size Adjustment: Inspired by Trust Region methods, the algorithm dynamically adjusts the perturbation size based on the success rate of perturbations.

Experimental Validation

The authors validate the efficacy of the Boundary Attack on three standard datasets: MNIST, CIFAR-10, and ImageNet, using different architectures such as VGG-19, ResNet-50, and Inception-v3. The results are compared with existing gradient-based methods, including FGSM, DeepFool, and the Carlini-Wagner attack. Notably, the Boundary Attack achieves comparable performance in terms of minimal perturbation size despite significantly fewer iterations. However, it requires more forward passes due to the lack of gradient information.

Robustness and Practical Implications

A notable advantage of the Boundary Attack is its robustness against common defenses that exploit gradient masking or gradient obfuscation, such as defensive distillation. By proving effective against these defenses, the Boundary Attack highlights the need for more robust model evaluation techniques. Furthermore, the authors demonstrate the attack's practical applicability by successfully generating adversarial examples against Clarifai's black-box APIs for brand and celebrity recognition.

Conclusion and Future Directions

The Boundary Attack represents a significant development in understanding and mitigating the vulnerability of machine learning models in real-world scenarios. The proposed method's simplicity, flexibility, and robustness underscore the relevance of decision-based attacks in security-critical applications. Future research may focus on refining the proposal distribution through learning methods or adapting the attack for other domains beyond computer vision, enhancing the viability and impact of this research across various machine learning applications. The publication encourages a broader investigation of decision-based attacks to further understand the robustness and security implications for deployed machine learning systems.

PDF Markdown

Related Papers

GitHub

GitHub - bethgelab/foolbox: A Python toolbox to create adversarial examples that fool neural networks in PyTorch, TensorFlow, and JAX (2,761 stars)

YouTube

Show All Videos