Structured Gradient-based Interpretations via Norm-Regularized Adversarial Training (2404.04647v1)
Abstract: Gradient-based saliency maps have been widely used to explain the decisions of deep neural network classifiers. However, standard gradient-based interpretation maps, including the simple gradient and integrated gradient algorithms, often lack desired structures such as sparsity and connectedness in their application to real-world computer vision models. A frequently used approach to inducing sparsity structures into gradient-based saliency maps is to alter the simple gradient scheme using sparsification or norm-based regularization. A drawback with such post-processing methods is their frequently-observed significant loss in fidelity to the original simple gradient map. In this work, we propose to apply adversarial training as an in-processing scheme to train neural networks with structured simple gradient maps. We show a duality relation between the regularized norms of the adversarial perturbations and gradient-based maps, based on which we design adversarial training loss functions promoting sparsity and group-sparsity properties in simple gradient maps. We present several numerical results to show the influence of our proposed norm-based adversarial training methods on the standard gradient-based maps of standard neural network architectures on benchmark image datasets.
- Sanity checks for saliency maps. Advances in neural information processing systems, 31, 2018.
- Assessing the trustworthiness of saliency maps for localizing abnormalities in medical imaging. Radiology: Artificial Intelligence, 3(6):e200267, 2021.
- Concise explanations of neural networks using adversarial training. In International Conference on Machine Learning, pages 1383–1391. PMLR, 2020.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- Boosting adversarial attacks with momentum. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9185–9193, 2018.
- Harmonizing the object recognition strategies of deep neural networks with humans. Advances in Neural Information Processing Systems, 35:9432–9446, 2022.
- Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11):665–673, 2020.
- Interpretation of neural networks is fragile. In Proceedings of the AAAI conference on artificial intelligence, pages 3681–3688, 2019.
- Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
- A review of semantic segmentation using deep neural networks. International journal of multimedia information retrieval, 7:87–93, 2018.
- On the impact of knowledge distillation for model interpretability. arXiv preprint arXiv:2305.15734, 2023.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Comparing measures of sparsity. IEEE Transactions on Information Theory, 55(10):4723–4741, 2009.
- Why are saliency maps noisy? cause of and solution to noisy saliency maps. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pages 4149–4157. IEEE, 2019a.
- Bridging adversarial robustness and gradient interpretability. arXiv preprint arXiv:1903.11626, 2019b.
- Interpretable learning for self-driving cars by visualizing causal attention. In Proceedings of the IEEE international conference on computer vision, pages 2942–2950, 2017.
- Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012.
- Certifiably robust interpretation in deep learning. arXiv preprint arXiv:1905.12105, 2019.
- Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
- Feature visualization. Distill, 2(11):e7, 2017.
- Human attention in fine-grained classification. arXiv preprint arXiv:2111.01628, 2021.
- Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. In Proceedings of the AAAI conference on artificial intelligence, 2018.
- Evaluating the visualization of what a deep neural network has learned. IEEE transactions on neural networks and learning systems, 28(11):2660–2673, 2016.
- Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017.
- Adversarial training for free! Advances in Neural Information Processing Systems, 32, 2019.
- Do input gradients highlight discriminative features? Advances in Neural Information Processing Systems, 34:2046–2059, 2021.
- Deep learning in medical image analysis. Annual review of biomedical engineering, 19:221–248, 2017.
- Learning important features through propagating activation differences. In International conference on machine learning, pages 3145–3153. PMLR, 2017.
- Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013.
- Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017.
- Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014.
- Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017.
- Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
- Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pages 6105–6114. PMLR, 2019.
- Quantifying explainability of saliency methods in deep neural networks with a synthetic dataset. IEEE Transactions on Artificial Intelligence, 2022.
- The caltech-ucsd birds-200-2011 dataset. 2011.
- Deep face recognition: A survey. Neurocomputing, 429:215–244, 2021.
- Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.
- Initialization noise in image gradients and saliency maps. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1766–1775, 2023.
- On the (in) fidelity and sensitivity of explanations. Advances in Neural Information Processing Systems, 32, 2019.
- Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B: Statistical Methodology, 68(1):49–67, 2006.
- Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13, pages 818–833. Springer, 2014.
- Moreaugrad: Sparse and robust interpretation of neural networks via moreau envelope. arXiv preprint arXiv:2302.05294, 2023.
- Object detection with deep learning: A review. IEEE transactions on neural networks and learning systems, 30(11):3212–3232, 2019.
- Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(2):301–320, 2005.