Distilling Adversarial Robustness Using Heterogeneous Teachers (2402.15586v1)
Abstract: Achieving resiliency against adversarial attacks is necessary prior to deploying neural network classifiers in domains where misclassification incurs substantial costs, e.g., self-driving cars or medical imaging. Recent work has demonstrated that robustness can be transferred from an adversarially trained teacher to a student model using knowledge distillation. However, current methods perform distillation using a single adversarial and vanilla teacher and consider homogeneous architectures (i.e., residual networks) that are susceptible to misclassify examples from similar adversarial subspaces. In this work, we develop a defense framework against adversarial attacks by distilling adversarial robustness using heterogeneous teachers (DARHT). In DARHT, the student model explicitly represents teacher logits in a student-teacher feature map and leverages multiple teachers that exhibit low adversarial example transferability (i.e., exhibit high performance on dissimilar adversarial examples). Experiments on classification tasks in both white-box and black-box scenarios demonstrate that DARHT achieves state-of-the-art clean and robust accuracies when compared to competing adversarial training and distillation methods in the CIFAR-10, CIFAR-100, and Tiny ImageNet datasets. Comparisons with homogeneous and heterogeneous teacher sets suggest that leveraging teachers with low adversarial example transferability increases student model robustness.
- Square attack: a query-efficient black-box adversarial attack via random search. In European conference on computer vision, pages 484–501. Springer, 2020.
- Feature distillation with guided adversarial contrastive learning. arXiv preprint arXiv:2009.09922, 2020.
- Recent advances in adversarial training for adversarial robustness. arXiv preprint arXiv:2102.01356, 2021.
- Curriculum adversarial training. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, pages 3740–3747, 2018.
- Adversarial sensor attack on lidar-based perception in autonomous driving. In Proceedings of the 2019 ACM SIGSAC conference on computer and communications security, pages 2267–2281, 2019.
- Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy, pages 39–57. IEEE, 2017.
- Robust overfitting may be mitigated by properly learned smoothening. In International Conference on Learning Representations, 2020.
- Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In International conference on machine learning, pages 2206–2216. PMLR, 2020.
- Adversarial classification. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 99–108, 2004.
- An image is worth 16x16 words: Transformers for image recognition at scale. ICLR, 2021.
- Adversarial attacks on medical machine learning. Science, 363(6433):1287–1289, 2019.
- Born again neural networks. In International Conference on Machine Learning, pages 1607–1616. PMLR, 2018.
- Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pages 1050–1059. PMLR, 2016.
- Adversarially robust distillation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 3996–4003, 2020.
- Explaining and harnessing adversarial examples. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
- Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2015.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
- Las-at: Adversarial training with learnable attack strategy. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13388–13398, 2022.
- Learning multiple layers of features from tiny images. 2009.
- Ya Le and Xuan S. Yang. Tiny imagenet visual recognition challenge. 2015.
- Delving into transferable adversarial examples and black-box attacks. arXiv preprint arXiv:1611.02770, 2016.
- Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
- On the robustness of vision transformers to adversarial examples. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 7818–7827, 2021.
- Besting the black-box: Barrier zones for adversarial example defense. IEEE Access, 10:1451–1474, 2022.
- On the benefits of knowledge distillation for adversarial robustness. arXiv preprint arXiv:2203.07159, 2022.
- Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE symposium on security and privacy (SP), pages 582–597. IEEE, 2016.
- Towards understanding knowledge distillation. In International conference on machine learning, pages 5142–5151. PMLR, 2019.
- Barrage of random transforms for adversarially robust defense. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6521–6530, 2019.
- Certified defenses against adversarial examples. arXiv preprint arXiv:1801.09344, 2018.
- Overfitting in adversarially robust deep learning. In International Conference on Machine Learning, pages 8093–8104. PMLR, 2020.
- Understanding the logit distributions of adversarially-trained deep neural networks, 2021.
- Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929–1958, 2014.
- Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
- Understanding and improving knowledge distillation. arXiv preprint arXiv:2002.03532, 2020.
- The space of transferable adversarial examples. arXiv preprint arXiv:1704.03453, 2017.
- On adaptive attacks to adversarial example defenses. Advances in neural information processing systems, 33:1633–1645, 2020.
- Defensive dropout for hardening deep neural networks under adversarial attacks. In 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages 1–8. IEEE, 2018.
- On the convergence and robustness of adversarial training. In International Conference on Machine Learning, pages 6586–6595. PMLR, 2019.
- On the convergence and robustness of adversarial training. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 6586–6595. PMLR, 09–15 Jun 2019.
- Securing the spike: On the transferabilty and security of spiking neural networks to adversarial examples. arXiv preprint arXiv:2209.03358, 2022.
- Reliable evaluation of adversarial transferability, 2023.
- Wide residual networks. ArXiv, abs/1605.07146, 2016.
- Theoretically principled trade-off between robustness and accuracy. In International conference on machine learning, pages 7472–7482. PMLR, 2019.
- Attacks which do not kill training make adversarial learning stronger. In ICML, 2020.
- Enhanced accuracy and robustness via multi-teacher adversarial distillation. In European Conference on Computer Vision, pages 585–602. Springer, 2022.
- Mitigating the accuracy-robustness trade-off via multi-teacher adversarial distillation. arXiv preprint arXiv:2306.16170, 2023.
- Reliable adversarial distillation with unreliable teachers. In International Conference on Learning Representations, 2022.
- Revisiting adversarial robustness distillation: Robust soft labels make student better. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 16443–16452, 2021.
- Jieren Deng (12 papers)
- Aaron Palmer (4 papers)
- Rigel Mahmood (3 papers)
- Ethan Rathbun (8 papers)
- Jinbo Bi (28 papers)
- Kaleel Mahmood (16 papers)
- Derek Aguiar (9 papers)