Data-free Defense of Black Box Models Against Adversarial Attacks (2211.01579v3)
Abstract: Several companies often safeguard their trained deep models (i.e., details of architecture, learnt weights, training details etc.) from third-party users by exposing them only as black boxes through APIs. Moreover, they may not even provide access to the training data due to proprietary reasons or sensitivity concerns. In this work, we propose a novel defense mechanism for black box models against adversarial attacks in a data-free set up. We construct synthetic data via generative model and train surrogate network using model stealing techniques. To minimize adversarial contamination on perturbed samples, we propose 'wavelet noise remover' (WNR) that performs discrete wavelet decomposition on input images and carefully select only a few important coefficients determined by our 'wavelet coefficient selection module' (WCSM). To recover the high-frequency content of the image after noise removal via WNR, we further train a 'regenerator' network with an objective to retrieve the coefficients such that the reconstructed image yields similar to original predictions on the surrogate model. At test time, WNR combined with trained regenerator network is prepended to the black box network, resulting in a high boost in adversarial accuracy. Our method improves the adversarial accuracy on CIFAR-10 by 38.98% and 32.01% on state-of-the-art Auto Attack compared to baseline, even when the attacker uses surrogate architecture (Alexnet-half and Alexnet) similar to the black box architecture (Alexnet) with same model stealing strategy as defender. The code is available at https://github.com/vcl-iisc/data-free-black-box-defense
- Towards achieving adversarial robustness by enforcing feature consistency across bit planes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- Threat of adversarial attacks on deep learning in computer vision: A survey. Ieee Access, 6:14410–14430, 2018.
- Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations, ICLR 2015, Conference Track Proceedings, 2015.
- Black-box ripper: Copying black-box models using generative evolutionary algorithms. Advances in Neural Information Processing Systems, 33:20120–20129, 2020.
- Fast wavelet transforms and numerical algorithms i. Communications on Pure and Applied Mathematics, 44:141–183, 1991.
- Jacobian adversarially regularized networks for robustness. In International Conference on Learning Representations, 2020.
- Improving adversarial robustness via guided complement entropy. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4881–4889, 2019.
- Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, 2017.
- Towards improving fast adversarial training in multi-exit network. Neural Networks, 150:1–11, 2022.
- Biorthogonal bases of compactly supported wavelets. Communications on Pure and Applied Mathematics, 45:485–560, 1992.
- Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In International conference on machine learning, pages 2206–2216. PMLR, 2020a.
- Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In International conference on machine learning, pages 2206–2216. PMLR, 2020b.
- Ingrid Daubechies. Orthonormal bases of compactly supported wavelets. Communications on Pure and Applied Mathematics, 41(7):909–996, 1988.
- Ingrid Daubechies. Ten lectures on wavelets. Computers in Physics, 6:697–697, 1992a.
- Ingrid Daubechies. Ten Lectures on Wavelets. Society for Industrial and Applied Mathematics, 1992b.
- David L Donoho. De-noising by soft-thresholding. IEEE transactions on information theory, 41(3):613–627, 1995.
- Ideal spatial adaptation by wavelet shrinkage. biometrika, 81(3):425–455, 1994.
- Evaluating deep learning architectures for speech emotion recognition. Neural Networks, 92:60–68, 2017.
- Explaining and harnessing adversarial examples. In International Conference on Learning Representations, ICLR 2015, Conference Track Proceedings, 2015.
- Sit: Stochastic input transformation to defend against adversarial attacks on deep neural networks. IEEE Design & Test, 39(3):63–72, 2021.
- Boosting the transferability of adversarial examples via stochastic serial attack. Neural Networks, 150:58–67, 2022.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Adversarial example defenses: ensembles of weak defenses are not strong. In Proceedings of the 11th USENIX Conference on Offensive Technologies, pages 15–15, 2017.
- Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation Learning Workshop, 2015.
- Defeat: Decoupled feature attack across deep neural networks. Neural Networks, 156:13–28, 2022.
- Maze: Data-free model stealing attack using zeroth-order gradient estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13814–13823, 2021.
- Learning multiple layers of features from tiny images. Technical report, Canadian Institute for Advanced Research, 2009.
- Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2012.
- Adversarial examples in the physical world. In Artificial intelligence safety and security, pages 99–112. Chapman and Hall/CRC, 2018a.
- Adversarial examples in the physical world. In Artificial intelligence safety and security, pages 99–112. Chapman and Hall/CRC, 2018b.
- Interpolated adversarial training: Achieving robust neural networks without sacrificing too much accuracy. Neural Networks, 154:218–233, 2022.
- Bat: Block and token self-attention for speech emotion recognition. Neural Networks, 156:67–80, 2022.
- Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018a.
- Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018b.
- Magnet: a two-pronged defense against adversarial examples. In Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pages 135–147, 2017.
- Improving data augmentation for low resource speech-to-text translation with diverse paraphrasing. Neural Networks, 148:194–205, 2022.
- Image super-resolution as a defense against adversarial attacks. IEEE Transactions on Image Processing, 29:1711–1724, 2019.
- Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011, 2011.
- Knockoff nets: Stealing functionality of black-box models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4954–4963, 2019.
- Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications security, pages 506–519, 2017.
- Deflecting adversarial attacks with pixel deflection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8571–8580, 2018.
- An efficient preprocessing-based approach to mitigate advanced adversarial attacks. IEEE Transactions on Computers, 2021.
- Deep convolutional neural networks for image classification: A comprehensive review. Neural Computation, 29:2352–2449, 2017.
- U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015a.
- U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015b.
- Defense-gan: Protecting classifiers against adversarial attacks using generative models. arXiv preprint arXiv:1805.06605, 2018.
- Towards data-free model stealing in a hard label setting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15284–15293, 2022.
- Adversarial defense by stratified convolutional sparse coding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11447–11456, 2019.
- Intriguing properties of neural networks. In International Conference on Learning Representations, ICLR 2014, Conference Track Proceedings, 2014.
- Defending black box facial recognition classifiers against adversarial attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2020.
- Data-free model extraction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4771–4780, 2021.
- Vulnerability of classifiers to evolutionary generated adversarial examples. Neural Networks, 127:168–181, 2020.
- Deep learning for computer vision: A brief review. Computational Intelligence and Neuroscience, 2018, 2018.
- High-frequency component helps explain the generalization of convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8684–8694, 2020.
- A fourier perspective on model robustness in computer vision. In NeurIPS, 2019.
- Remix: Towards the transferability of adversarial examples. Neural Networks, 163:367–378, 2023.
- Dast: Data-free substitute training for adversarial attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 234–243, 2020.
- Revisiting adversarial robustness distillation: Robust soft labels make student better. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 16443–16452, 2021.