Does Saliency-Based Training bring Robustness for Deep Neural Networks in Image Classification? (2306.16581v1)
Abstract: Deep Neural Networks are powerful tools to understand complex patterns and making decisions. However, their black-box nature impedes a complete understanding of their inner workings. While online saliency-guided training methods try to highlight the prominent features in the model's output to alleviate this problem, it is still ambiguous if the visually explainable features align with robustness of the model against adversarial examples. In this paper, we investigate the saliency trained model's vulnerability to adversarial examples methods. Models are trained using an online saliency-guided training method and evaluated against popular algorithms of adversarial examples. We quantify the robustness and conclude that despite the well-explained visualizations in the model's output, the salient models suffer from the lower performance against adversarial examples attacks.
- Deep learning. nature, 521(7553):436–444, 2015.
- Learning hierarchical features for scene labeling. IEEE transactions on pattern analysis and machine intelligence, 35(8):1915–1929, 2012.
- Recurrent neural network based language model. In Interspeech, volume 2, pages 1045–1048. Makuhari, 2010.
- Applications of deep learning in healthcare and biomedicine. Deep learning techniques for biomedical and health informatics, pages 57–77, 2020.
- Large language models in machine translation. 2007.
- Do explanations explain? model knows best. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10244–10253, 2022.
- Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017.
- Smooth grad-cam++: An enhanced inference level visualization technique for deep convolutional neural network models. arXiv preprint arXiv:1908.01224, 2019.
- Understanding deep networks via extremal perturbations and smooth masks. In Proceedings of the IEEE/CVF international conference on computer vision, pages 2950–2958, 2019.
- Interpretable explanations of black boxes by meaningful perturbation. In Proceedings of the IEEE international conference on computer vision, pages 3429–3437, 2017.
- Evaluating the visualization of what a deep neural network has learned. IEEE transactions on neural networks and learning systems, 28(11):2660–2673, 2016.
- Distilling a neural network into a soft decision tree. arXiv preprint arXiv:1711.09784, 2017.
- Beyond sparsity: Tree regularization of deep models for interpretability. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
- Right for the right reasons: Training differentiable models by constraining their explanations. arXiv preprint arXiv:1703.03717, 2017.
- Saliency learning: Teaching the model where to pay attention. arXiv preprint arXiv:1902.08649, 2019.
- Improving deep learning interpretability by saliency guided training. Advances in Neural Information Processing Systems, 34:26726–26739, 2021.
- Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
- Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
- Adversarial examples in the physical world. In Artificial intelligence safety and security, pages 99–112. Chapman and Hall/CRC, 2018.
- Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
- Boosting adversarial attacks with momentum. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9185–9193, 2018.
- On the connection between adversarial robustness and saliency map interpretability. arXiv preprint arXiv:1905.04172, 2019.
- Foolbox: A python toolbox to benchmark the robustness of machine learning models. arXiv preprint arXiv:1707.04131, 2017.
- Mnist handwritten digit database, 2010.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Matthew D Zeiler. Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701, 2012.
- Captum: A unified and generic model interpretability library for pytorch. arXiv preprint arXiv:2009.07896, 2020.