Sparse-RS: a versatile framework for query-efficient sparse black-box adversarial attacks (2006.12834v3)

Published 23 Jun 2020 in cs.LG, cs.CR, cs.CV, and stat.ML

Abstract: We propose a versatile framework based on random search, Sparse-RS, for score-based sparse targeted and untargeted attacks in the black-box setting. Sparse-RS does not rely on substitute models and achieves state-of-the-art success rate and query efficiency for multiple sparse attack models: $l_0$-bounded perturbations, adversarial patches, and adversarial frames. The $l_0$-version of untargeted Sparse-RS outperforms all black-box and even all white-box attacks for different models on MNIST, CIFAR-10, and ImageNet. Moreover, our untargeted Sparse-RS achieves very high success rates even for the challenging settings of $20\times20$ adversarial patches and $2$-pixel wide adversarial frames for $224\times224$ images. Finally, we show that Sparse-RS can be applied to generate targeted universal adversarial patches where it significantly outperforms the existing approaches. The code of our framework is available at https://github.com/fra31/sparse-rs.

Citations (83)

View on Semantic Scholar

Summary

The paper presents Sparse-RS, a novel framework employing randomized search to generate sparse adversarial perturbations in black-box settings.
The paper demonstrates that Sparse-RS achieves superior performance on l0-bounded, patch, and frame attacks, outperforming even some white-box techniques.
The paper extends the applicability of Sparse-RS beyond computer vision, showing its versatility in areas like malware detection and benchmark robustness evaluation.

An Analysis of Sparse-RS: A Framework for Query-Efficient Sparse Black-Box Adversarial Attacks

The paper "Sparse-RS: a Versatile Framework for Query-Efficient Sparse Black-Box Adversarial Attacks," authored by Francesco Croce et al., presents a novel framework tailored for executing adversarial attacks in black-box settings, employing sparse perturbations with high query efficiency. The framework is noteworthy for its reliance on random search as a mechanism to optimize these perturbations, specifically targeting metrics such as $l_0$ -bounded changes, adversarial patches, and adversarial frames.

At its core, Sparse-RS operates as a randomized algorithm designed to minimize a certain loss objective in adversarial settings, where only the output or score of a target model is observable, without knowledge of the model’s internal parameters or gradients. This situation typically characterizes a black-box adversarial setting. The framework strategically selects which elements of the input to perturb and how significantly to perturb them, while balancing the perturbation's sparsity and the attack's success in misleading the target model.

Key Numerical Results and Claims

Sparse-RS has exhibited a remarkable performance on multiple standard datasets, including MNIST, CIFAR-10, and ImageNet. It notably achieves superior success rates and query efficiency in comparison to existing black-box and even some white-box adversarial techniques. Specifically:

$l_0$ -Bounded Perturbations: Sparse-RS demonstrates the capacity to outperform existing black-box attacks and successfully surpass the state-of-the-art white-box techniques in scenarios involving $l_0$ -bounded perturbations.
Adversarial Patches and Frames: In the context of adversarial patches and frames, Sparse-RS achieves high success rates even under stringent settings, such as generating effective attacks with adversarial patches as small as $20 \times 20$ pixels and with adversarial frames that are only 2 pixels wide on a $224 \times 224$ image.

One of the paper's significant assertions is that the framework effectively generalizes beyond standard computer vision settings to applications such as malware detection, underscoring its versatility. The authors successfully extend the framework to create black-box, targeted universal adversarial patches, showing promising results without depending on transfer attacks from surrogate models.

Implications and Speculations on Future Developments

The implications of Sparse-RS extend into both practical and theoretical arenas within the adversarial learning domain. On a practical level, the framework offers a robust tool for evaluating model robustness against sparse adversarial attacks across diverse security-critical applications, such as autonomous systems and malware classification. This pragmatism showcases the potential of Sparse-RS to become a standard benchmark for measuring model resilience against black-box attacks due to its high success rate and efficiency.

Theoretically, Sparse-RS challenges existing assumptions regarding the efficacy of white-box versus black-box attacks, especially in sparse contexts, by demonstrating that black-box attacks can approach, and at times surpass, the performance of white-box techniques. This challenges the existing narrative about the supremacy of white-box settings in yielding potent adversarial attacks and invites future research into refining black-box methods.

For future developments, the flexibility and simplicity of Sparse-RS make it a prime candidate for integration into ensemble methods, potentially increasing adversarial effectiveness against enhanced model defenses. Additionally, the exploration of more sophisticated random search techniques or alternative optimization strategies could improve efficiency further, potentially reducing query complexity to sublinear rates relative to input dimensions, echoing theoretical insights provided in the paper.

In summary, Sparse-RS presents a robust framework poised to significantly advance the state-of-the-art in sparse adversarial attacks across black-box settings. The paper by Croce et al. provides thoughtful insights and a comprehensive evaluation that establishes Sparse-RS as a pivotal contribution to adversarial machine learning research.

PDF Markdown

Related Papers

GitHub

GitHub - fra31/sparse-rs: Sparse-RS: a versatile framework for query-efficient sparse black-box adversarial attacks (44 stars)