Beyond ImageNet Attack: Towards Crafting Adversarial Examples for Black-box Domains (2201.11528v4)

Published 27 Jan 2022 in cs.CV

Abstract: Adversarial examples have posed a severe threat to deep neural networks due to their transferable nature. Currently, various works have paid great efforts to enhance the cross-model transferability, which mostly assume the substitute model is trained in the same domain as the target model. However, in reality, the relevant information of the deployed model is unlikely to leak. Hence, it is vital to build a more practical black-box threat model to overcome this limitation and evaluate the vulnerability of deployed models. In this paper, with only the knowledge of the ImageNet domain, we propose a Beyond ImageNet Attack (BIA) to investigate the transferability towards black-box domains (unknown classification tasks). Specifically, we leverage a generative model to learn the adversarial function for disrupting low-level features of input images. Based on this framework, we further propose two variants to narrow the gap between the source and target domains from the data and model perspectives, respectively. Extensive experiments on coarse-grained and fine-grained domains demonstrate the effectiveness of our proposed methods. Notably, our methods outperform state-of-the-art approaches by up to 7.71\% (towards coarse-grained domains) and 25.91\% (towards fine-grained domains) on average. Our code is available at \url{https://github.com/qilong-zhang/Beyond-ImageNet-Attack}.

Citations (56)

View on Semantic Scholar

Summary

The paper introduces a novel framework (BIA) that enhances cross-domain transferability by perturbing intermediate features with a generative adversarial approach.
Key modules like Random Normalization and Domain-agnostic Attention effectively narrow the source-target domain gap and boost attack success rates.
Experimental evaluations demonstrate that BIA outperforms traditional attacks on both coarse- and fine-grained tasks, exposing vulnerabilities in black-box systems.

An Overview of Beyond ImageNet Attack for Crafting Adversarial Examples in Black-box Domains

The paper "Beyond ImageNet Attack: Towards Crafting Adversarial Examples for Black-box Domains" addresses a significant challenge in the field of adversarial machine learning: the development of adversarial examples that exhibit strong cross-domain transferability, particularly targeting black-box models in unknown classification tasks. This paper presents a novel framework known as Beyond ImageNet Attack (BIA) which leverages only the knowledge of the ImageNet domain—both its data distribution and pre-trained models—to enhance adversarial example transferability to other domains.

Key Components of the BIA Framework

Generative Adversarial Function: At the core of the BIA framework is a generative model designed to learn an adversarial function that effectively disrupts the low-level features of input images. Rather than optimizing a domain-specific loss function—which could potentially lead to overfitting—the method focuses on perturbing intermediate layers that capture generalizable features across different tasks and models.
Variants to Narrow Source-Target Domain Gaps:
- Random Normalization (RN): This module aims to simulate data distributions from various domains during training by applying random Gaussian scaling to the inputs. This randomness helps widen the distributional scenarios under which the generator learns, thus increasing its adaptability across different domains.
- Domain-agnostic Attention (DA): This module enhances feature extraction robustness by applying cross-channel average pooling to intermediate features, thus enabling the generator to concentrate on essential features more effectively, even when target domain features differ significantly from the training domain.

Experimental Evaluations and Results

The validation of the BIA framework involves extensive experimentation across both coarse-grained and fine-grained classification tasks, paired with comparisons against established adversarial attack methodologies such as PGD, DIM, DR, SSP, and CDA. The results are notable:

In coarse-grained domains, BIA methods outperform existing approaches, with the RN variant showing a notable improvement in attack success rates, indicating its efficacy in handling variations in input distribution across domains.
For fine-grained tasks, the DA module contributes significantly to the attack success by mitigating biases in feature extraction and achieving higher transferability.
Even in the source domain, experiments show that BIA variants improve cross-model transferability, highlighting their capability to enhance adversarial attack robustness beyond black-box objectives.

Implications and Future Directions

The practical implications of BIA are profound, as it demonstrates the potential vulnerabilities of deployed models to adversarial examples generated without explicit knowledge of their data or architecture. The research suggests that model owners need to be vigilant about the robustness of their models against cross-domain adversarial perturbations. Theoretically, BIA’s capacity to leverage generalizable features through a generative model also highlights avenues for enhancing model security in adversarial contexts.

Future exploration in AI can expand on BIA by incorporating adaptive self-synthesizing adversarial examples that can dynamically adjust perturbation strategies based on real-time feedback. Moreover, integrating BIA principles with advanced feature extraction techniques could yield even greater cross-domain adaptability, further challenging the robustness paradigms of AI systems.

In conclusion, this paper provides a substantial contribution to our understanding of adversarial examples and their transferability across model domains. Through innovative methods and thorough experimental validity, it extends the landscape of adversarial attacks, prompting further investigation into both defense strategies and adversarial learning approaches.

PDF Markdown

Related Papers

GitHub

GitHub - Alibaba-AAIG/Beyond-ImageNet-Attack: Beyond imagenet attack (accepted by ICLR 2022) towards crafting adversarial examples for black-box domains. (61 stars)