Square Attack: a query-efficient black-box adversarial attack via random search (1912.00049v3)

Published 29 Nov 2019 in cs.LG, cs.CR, cs.CV, and stat.ML

Abstract: We propose the Square Attack, a score-based black-box $l_2$- and $l_\infty$-adversarial attack that does not rely on local gradient information and thus is not affected by gradient masking. Square Attack is based on a randomized search scheme which selects localized square-shaped updates at random positions so that at each iteration the perturbation is situated approximately at the boundary of the feasible set. Our method is significantly more query efficient and achieves a higher success rate compared to the state-of-the-art methods, especially in the untargeted setting. In particular, on ImageNet we improve the average query efficiency in the untargeted setting for various deep networks by a factor of at least $1.8$ and up to $3$ compared to the recent state-of-the-art $l_\infty$-attack of Al-Dujaili & O'Reilly. Moreover, although our attack is black-box, it can also outperform gradient-based white-box attacks on the standard benchmarks achieving a new state-of-the-art in terms of the success rate. The code of our attack is available at https://github.com/max-andr/square-attack.

Citations (882)

View on Semantic Scholar

Summary

The paper presents a novel randomized search method using square-shaped perturbations to efficiently generate adversarial examples without gradient access.
It achieves remarkable query efficiency, reducing failure rates and requiring up to three times fewer queries on models like ResNet-50 and VGG-16-BN.
The approach supports both l∞ and l₂ norms and even occasionally surpasses gradient-based attacks in lowering robust accuracy on adversarially trained models.

Square Attack: A Query-Efficient Black-Box Adversarial Attack via Random Search

Overview

The paper presents the Square Attack, a novel score-based black-box adversarial attack focused primarily on efficiency in query consumption. The attack does not rely on local gradient information, allowing it to avoid issues persisting in other methods due to gradient masking. The core mechanism of the Square Attack is a randomized search strategy that iteratively updates localized, square-shaped perturbations at random positions. This design helps position the perturbation near the boundary of the feasible set, ensuring effective utilization of the perturbation budget at each iteration.

Key Contributions and Methodology

Algorithm Design: The central concept of Square Attack revolves around random search, a well-established optimization approach. Updates are structured as squares with a side length decreasing according to a predefined schedule. Each update involves sampling a perturbation and adding it to the current candidate adversarial example, followed by evaluating the adversarial success.
l_inf and l_2 Variants: The attack provides implementations for both $l_\infty$ and $l_2$ norms.

l_inf Variant: Initializes with vertical stripes and iteratively adds square-shaped updates. Each square's perturbation is sampled uniformly from $\{-2\epsilon, 2\epsilon\}$ .
l_2 Variant: Initializes with perturbation tiles arranged in a grid and leverages two points of high variance in each square-shaped update, ensuring adherence to the $l_2$ norm constraints.

Theoretical Justification: The paper presents a convergence analysis based on the smoothness of the objective function and justifies the use of square-shaped perturbations, leveraging the sensitivity of neural networks to such localized modifications.

Experimental Results

Dataset and Models: Evaluations are conducted on ImageNet, using three models: Inception v3, ResNet-50, and VGG-16-BN.
Query Efficiency: For untargeted attacks, Square Attack outperforms state-of-the-art methods significantly in terms of both failure rates and average number of queries. Specifically, the attack achieves 0.0% failure on ResNet-50 and VGG-16-BN models, while requiring up to 3 times fewer queries compared to competing methods such as Bandits, Parsimonious, and SignHunter.
Targeted Attacks: The targeted version of the attack demonstrates superior performance, achieving 100% success with fewer queries compared to other methods.
Model Robustness: Notably, the Square Attack occasionally surpasses gradient-based white-box attacks. It reduces the robust accuracy of state-of-the-art adversarially trained models on MNIST lower than previously reported using white-box attacks.

Implications and Future Work

Practical Implications: The implications of the Square Attack are significant for the security and robustness evaluation of machine learning models. Its query-efficiency and effectiveness against gradient masking make it a vital tool for realistic adversarial robustness assessments, especially in black-box settings where access to model gradients is restricted.

Theoretical Implications: The convergence guarantees and the detailed justification for choice of square-shaped perturbations imply a strong foundation for randomized search methods in adversarial attack design. The demonstrated theoretical robustness to initialization and parameter choices further validate the approach.

Future Research Directions:

Extension to Other Norms and Constraints: Exploring the application of the Square Attack methodology for different norm constraints or hybrid models combining white-box and black-box aspects.
Higher-Dimensional Data: Adapting and evaluating the attack's performance on tasks beyond image classification, such as natural language processing and time series data.
Defense Mechanisms: Investigating potential countermeasures that could specifically mitigate the effectiveness of Square Attack, thereby guiding the development of more robust defensive strategies.

Conclusion

The Square Attack introduces a highly effective and query-efficient mechanism for conducting black-box adversarial attacks, presenting a notable advancement over existing state-of-the-art approaches. Its simplicity, coupled with robust theoretical and empirical validation, marks a significant contribution to the field of adversarial machine learning.

PDF Markdown

Related Papers

GitHub

GitHub - max-andr/square-attack: Square Attack: a query-efficient black-box adversarial attack via random search [ECCV 2020] (164 stars)