- The paper presents a novel randomized search method using square-shaped perturbations to efficiently generate adversarial examples without gradient access.
- It achieves remarkable query efficiency, reducing failure rates and requiring up to three times fewer queries on models like ResNet-50 and VGG-16-BN.
- The approach supports both l∞ and l₂ norms and even occasionally surpasses gradient-based attacks in lowering robust accuracy on adversarially trained models.
Square Attack: A Query-Efficient Black-Box Adversarial Attack via Random Search
Overview
The paper presents the Square Attack, a novel score-based black-box adversarial attack focused primarily on efficiency in query consumption. The attack does not rely on local gradient information, allowing it to avoid issues persisting in other methods due to gradient masking. The core mechanism of the Square Attack is a randomized search strategy that iteratively updates localized, square-shaped perturbations at random positions. This design helps position the perturbation near the boundary of the feasible set, ensuring effective utilization of the perturbation budget at each iteration.
Key Contributions and Methodology
- Algorithm Design: The central concept of Square Attack revolves around random search, a well-established optimization approach. Updates are structured as squares with a side length decreasing according to a predefined schedule. Each update involves sampling a perturbation and adding it to the current candidate adversarial example, followed by evaluating the adversarial success.
- l_inf and l_2 Variants: The attack provides implementations for both l∞ and l2 norms.
- l_inf Variant: Initializes with vertical stripes and iteratively adds square-shaped updates. Each square's perturbation is sampled uniformly from {−2ϵ,2ϵ}.
- l_2 Variant: Initializes with perturbation tiles arranged in a grid and leverages two points of high variance in each square-shaped update, ensuring adherence to the l2 norm constraints.
- Theoretical Justification: The paper presents a convergence analysis based on the smoothness of the objective function and justifies the use of square-shaped perturbations, leveraging the sensitivity of neural networks to such localized modifications.
Experimental Results
- Dataset and Models: Evaluations are conducted on ImageNet, using three models: Inception v3, ResNet-50, and VGG-16-BN.
- Query Efficiency: For untargeted attacks, Square Attack outperforms state-of-the-art methods significantly in terms of both failure rates and average number of queries. Specifically, the attack achieves 0.0% failure on ResNet-50 and VGG-16-BN models, while requiring up to 3 times fewer queries compared to competing methods such as Bandits, Parsimonious, and SignHunter.
- Targeted Attacks: The targeted version of the attack demonstrates superior performance, achieving 100% success with fewer queries compared to other methods.
- Model Robustness: Notably, the Square Attack occasionally surpasses gradient-based white-box attacks. It reduces the robust accuracy of state-of-the-art adversarially trained models on MNIST lower than previously reported using white-box attacks.
Implications and Future Work
Practical Implications: The implications of the Square Attack are significant for the security and robustness evaluation of machine learning models. Its query-efficiency and effectiveness against gradient masking make it a vital tool for realistic adversarial robustness assessments, especially in black-box settings where access to model gradients is restricted.
Theoretical Implications: The convergence guarantees and the detailed justification for choice of square-shaped perturbations imply a strong foundation for randomized search methods in adversarial attack design. The demonstrated theoretical robustness to initialization and parameter choices further validate the approach.
Future Research Directions:
- Extension to Other Norms and Constraints: Exploring the application of the Square Attack methodology for different norm constraints or hybrid models combining white-box and black-box aspects.
- Higher-Dimensional Data: Adapting and evaluating the attack's performance on tasks beyond image classification, such as natural language processing and time series data.
- Defense Mechanisms: Investigating potential countermeasures that could specifically mitigate the effectiveness of Square Attack, thereby guiding the development of more robust defensive strategies.
Conclusion
The Square Attack introduces a highly effective and query-efficient mechanism for conducting black-box adversarial attacks, presenting a notable advancement over existing state-of-the-art approaches. Its simplicity, coupled with robust theoretical and empirical validation, marks a significant contribution to the field of adversarial machine learning.