Causal Analysis for Robust Interpretability of Neural Networks (2305.08950v2)
Abstract: Interpreting the inner function of neural networks is crucial for the trustworthy development and deployment of these black-box models. Prior interpretability methods focus on correlation-based measures to attribute model decisions to individual examples. However, these measures are susceptible to noise and spurious correlations encoded in the model during the training phase (e.g., biased inputs, model overfitting, or misspecification). Moreover, this process has proven to result in noisy and unstable attributions that prevent any transparent understanding of the model's behavior. In this paper, we develop a robust interventional-based method grounded by causal analysis to capture cause-effect mechanisms in pre-trained neural networks and their relation to the prediction. Our novel approach relies on path interventions to infer the causal mechanisms within hidden layers and isolate relevant and necessary information (to model prediction), avoiding noisy ones. The result is task-specific causal explanatory graphs that can audit model behavior and express the actual causes underlying its performance. We apply our method to vision models trained on classification tasks. On image classification tasks, we provide extensive quantitative experiments to show that our approach can capture more stable and faithful explanations than standard attribution-based methods. Furthermore, the underlying causal graphs reveal the neural interactions in the model, making it a valuable tool in other applications (e.g., model repair).
- Towards robust interpretability with self-explaining neural networks. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
- Towards better understanding of gradient-based attribution methods for deep neural networks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018.
- One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI Explainability Techniques. arXiv e-prints, 2019.
- On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE, 10, 2015.
- Evaluating and aggregating feature-based model explanations. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, pages 3016–3022, Jan. 2021.
- Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims. arXiv e-prints, 2020.
- Neural network attributions: A causal perspective. In Proceedings of the 36th International Conference on Machine Learning, volume 97, pages 981–990, 2019.
- Net2vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8730–8738, 2018.
- Causal Abstractions of Neural Networks. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 9574–9586. Curran Associates, Inc., 2021.
- Neuron shapley: discovering the responsible neurons. In Proceedings of the 34th International Conference on Neural Information Processing Systems, pages 5922–5932. Curran Associates Inc., Dec. 2020.
- Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
- Identity Mappings in Deep Residual Networks. In Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling, editors, Computer Vision – ECCV 2016, pages 630–645, Cham, 2016. Springer International Publishing.
- Quantus: An Explainable AI Toolkit for Responsible Evaluation of Neural Network Explanations. arXiv:2202.06861 [cs], Feb. 2022. arXiv: 2202.06861.
- Why are saliency maps noisy? cause of and solution to noisy saliency maps. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pages 4149–4157, 2019.
- Towards falsifiable interpretability research. arXiv e-prints, Oct. 2020.
- Handwritten digit recognition with a back-propagation network. In D. Touretzky, editor, Advances in Neural Information Processing Systems, volume 2. Morgan-Kaufmann, 1989.
- Visualizing and understanding neural models in NLP. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 681–691, San Diego, California, June 2016. Association for Computational Linguistics.
- A convnet for the 2020s, 2022.
- Feature Visualization. Distill, 2(11):10.23915/distill.00007, Nov. 2017.
- Rise: Randomized input sampling for explanation of black-box models. In Proceedings of the British Machine Vision Conference (BMVC), 2018.
- “why should I trust you?”: Explaining the predictions of any classifier. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, pages 97–101. Association for Computational Linguistics, June 2016.
- IROF: a low resource evaluation metric for explanation methods. CoRR, abs/2003.08747, 2020.
- ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015.
- Mobilenetv2: Inverted residuals and linear bottlenecks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4510–4520, 2018.
- CXPlain: causal explanations for model interpretation under uncertainty. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, number 917, pages 10220–10230. Curran Associates Inc., Red Hook, NY, USA, Dec. 2019.
- Learning important features through propagating activation differences. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, pages 3145–3153. JMLR.org, 2017.
- Deep inside convolutional networks: Visualising image classification models and saliency maps. In Yoshua Bengio and Yann LeCun, editors, 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Workshop Track Proceedings, 2014.
- Striving for Simplicity: The All Convolutional Net. ICLR, 2015.
- Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, pages 3319–3328. JMLR.org, 2017.
- Investigating Gender Bias in Language Models Using Causal Mediation Analysis. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 12388–12401. Curran Associates, Inc., 2020.
- Visualizing and understanding convolutional networks. In David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars, editors, Computer Vision – ECCV 2014, pages 818–833, Cham, 2014. Springer International Publishing.
- Visualizing and Understanding Convolutional Networks. In David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars, editors, Computer Vision – ECCV 2014, pages 818–833. Springer International Publishing, 2014.
- Top-Down Neural Attention by Excitation Backprop. International Journal of Computer Vision, 126(10):1084–1102, Oct. 2018.
- Interpreting CNN knowledge via an explanatory graph. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, pages 4454–4463. AAAI Press, Feb. 2018.
- A survey on neural network interpretability. IEEE Transactions on Emerging Topics in Computational Intelligence, 5(5):726–742, 2021.
- Ola Ahmad (12 papers)
- Nicolas Bereux (3 papers)
- Loïc Baret (1 paper)
- Vahid Hashemi (15 papers)
- Freddy Lecue (36 papers)