AutoZOOM: Autoencoder-based Zeroth Order Optimization Method for Attacking Black-box Neural Networks (1805.11770v5)

Published 30 May 2018 in cs.CV, cs.CR, and stat.ML

Abstract: Recent studies have shown that adversarial examples in state-of-the-art image classifiers trained by deep neural networks (DNN) can be easily generated when the target model is transparent to an attacker, known as the white-box setting. However, when attacking a deployed machine learning service, one can only acquire the input-output correspondences of the target model; this is the so-called black-box attack setting. The major drawback of existing black-box attacks is the need for excessive model queries, which may give a false sense of model robustness due to inefficient query designs. To bridge this gap, we propose a generic framework for query-efficient black-box attacks. Our framework, AutoZOOM, which is short for Autoencoder-based Zeroth Order Optimization Method, has two novel building blocks towards efficient black-box attacks: (i) an adaptive random gradient estimation strategy to balance query counts and distortion, and (ii) an autoencoder that is either trained offline with unlabeled data or a bilinear resizing operation for attack acceleration. Experimental results suggest that, by applying AutoZOOM to a state-of-the-art black-box attack (ZOO), a significant reduction in model queries can be achieved without sacrificing the attack success rate and the visual quality of the resulting adversarial examples. In particular, when compared to the standard ZOO method, AutoZOOM can consistently reduce the mean query counts in finding successful adversarial examples (or reaching the same distortion level) by at least 93% on MNIST, CIFAR-10 and ImageNet datasets, leading to novel insights on adversarial robustness.

View on arXiv

Authors (8)

Chun-Chen Tu (5 papers)
Paishun Ting (4 papers)
Pin-Yu Chen (311 papers)
Sijia Liu (204 papers)
Huan Zhang (171 papers)
Jinfeng Yi (61 papers)
Cho-Jui Hsieh (211 papers)
Shin-Ming Cheng (12 papers)

Citations (377)

View on Semantic Scholar

Summary

AutoZOOM: Advancing Query Efficiency in Black-box Neural Network Attacks

The paper "AutoZOOM: Autoencoder-based Zeroth Order Optimization Method for Attacking Black-box Neural Networks" introduces a framework named AutoZOOM aimed at enhancing the efficiency of black-box adversarial attacks on neural networks. Black-box attacks involve crafting adversarial examples without direct access to the target model's internal parameters, relying instead on model input-output observations. This constraint makes gradient estimation a daunting challenge, traditionally necessitating numerous model queries. AutoZOOM innovatively addresses this challenge using a dual approach integrating random gradient estimation and effective dimension reduction.

Methodological Innovations

AutoZOOM is built on two core innovations:

Adaptive Random Gradient Estimation: AutoZOOM abandons the traditional coordinate-wise gradient estimation, instead opting for a random full gradient estimation. The proposed method balances query counts against estimation accuracy by adapting the averaging parameter, $q$ , of the gradient estimator throughout the attack process. Initially, $q = 1$ is used to quickly achieve initial adversarial success. Post-success, the method increases $q$ to refine adversarial examples, thereby reducing distortion while controlling query burdens.
Attack Dimension Reduction via Autoencoders: To mitigate the dimensionality challenge inherent in crafting adversarial perturbations, AutoZOOM employs autoencoders (AE) and bilinear resizing operations (BiLIN) to reduce the problem size. Autoencoders, trained offline on unrelated, unlabeled datasets, capture low-dimensional representations of images. Meanwhile, bilinear resizing provides a computationally efficient, on-the-fly alternative without training overheads. This innovation significantly cuts down query requirements, leveraging the lower-dimensional space to execute perturbations with enhanced query efficiency.

Empirical Evaluation

The paper reports compelling results demonstrating the query efficiency gains of AutoZOOM across standard image classification datasets (MNIST, CIFAR-10, and ImageNet). Notably, AutoZOOM achieves at least a 93% reduction in query counts needed for successful adversarial attacks compared to the state-of-the-art ZOO method, without compromising attack success rates or visual quality of adversarial examples. These results underscore the potential of AutoZOOM to effectively challenge the robustness of deep learning models, particularly in large networks like those trained on ImageNet, which suffer from severe dimensionality issues.

Implications and Future Directions

The reduced query cost associated with AutoZOOM offers significant implications for both offensive and defensive strategies within AI security. On the offensive side, AutoZOOM makes it feasible to perform extensive robustness evaluations of deployed machine learning models at lower operational costs. This equips attackers with a tool to efficiently probe vulnerabilities, potentially motivating more resilient model designs.

From a defensive standpoint, AutoZOOM shines a light on the critical aspect of model evaluation under adversarial settings, highlighting the need for robust defense mechanisms that account for efficient adversarial strategies that leverage dimensionality reductions and random gradient approaches. The framework suggests an interesting avenue where learning low-dimensional adversarial representations might further enhance attack strategies, a potential area for future exploration. Continued research could further investigate the integration of other dimension reduction techniques or optimization strategies to amplify the efficacy of adversarial attacks.

Overall, AutoZOOM marks a significant advancement in black-box adversarial attacks, marrying efficiency with effectiveness and opening new perspectives on the security threats posed by machine learning models.

PDF Markdown

Related Papers

Find Related Papers