Causal Generative Explainers using Counterfactual Inference: A Case Study on the Morpho-MNIST Dataset (2401.11394v1)
Abstract: In this paper, we propose leveraging causal generative learning as an interpretable tool for explaining image classifiers. Specifically, we present a generative counterfactual inference approach to study the influence of visual features (i.e., pixels) as well as causal factors through generative learning. To this end, we first uncover the most influential pixels on a classifier's decision by varying the value of a causal attribute via counterfactual inference and computing both Shapely and contrastive explanations for counterfactual images with these different attribute values. We then establish a Monte-Carlo mechanism using the generator of a causal generative model in order to adapt Shapley explainers to produce feature importances for the human-interpretable attributes of a causal dataset in the case where a classifier has been trained exclusively on the images of the dataset. Finally, we present optimization methods for creating counterfactual explanations of classifiers by means of counterfactual inference, proposing straightforward approaches for both differentiable and arbitrary classifiers. We exploit the Morpho-MNIST causal dataset as a case study for exploring our proposed methods for generating counterfacutl explantions. We employ visual explanation methods from OmnixAI open source toolkit to compare them with our proposed methods. By employing quantitative metrics to measure the interpretability of counterfactual explanations, we find that our proposed methods of counterfactual explanation offer more interpretable explanations compared to those generated from OmnixAI. This finding suggests that our methods are well-suited for generating highly interpretable counterfactual explanations on causal datasets.
- Variational inference with normalizing flows. In International conference on machine learning, pages 1530–1538. PMLR, 2015.
- Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
- Deep structural causal models for tractable counterfactual inference. Advances in Neural Information Processing Systems, 33:857–869, 2020.
- Evaluating and mitigating bias in image classifiers: A causal perspective using counterfactuals. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 915–924, 2022.
- Bernhard Schölkopf and Julius von Kügelgen. From statistical to causal learning, 2022. URL https://arxiv.org/abs/2204.00607.
- Explainable ai (xai): Core ideas, techniques, and solutions. ACM Computing Surveys, 55(9):1–33, 2023.
- Conditional generative models for counterfactual explanations. arXiv preprint arXiv:2101.10123, 2021.
- Model-based counterfactual synthesizer for interpretation. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pages 1964–1974, 2021.
- A unified approach to interpreting model predictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 4765–4774. Curran Associates, Inc., 2017. URL http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf.
- Explanations based on the missing: Towards contrastive explanations with pertinent negatives, 2018.
- Normalizing flows: An introduction and review of current methods. IEEE transactions on pattern analysis and machine intelligence, 43(11):3964–3979, 2020.
- Morpho-mnist: Quantitative assessment and diagnostics for representation learning. CoRR, abs/1809.10780, 2018. URL http://arxiv.org/abs/1809.10780.
- Li Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141–142, 2012.
- Adversarial feature learning. arXiv preprint arXiv:1605.09782, 2016.
- Adversarially learned inference. arXiv preprint arXiv:1606.00704, 2016.
- Uncertainty principles of encoding gans. In International Conference on Machine Learning, pages 3240–3251. PMLR, 2021.
- Explaining visual models by causal attribution. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pages 4167–4175. IEEE, 2019.
- Judea Pearl. Causality. Cambridge university press, 2009.
- Ecinn: efficient counterfactuals from invertible neural networks. arXiv preprint arXiv:2103.13701, 2021a.
- Nice: Non-linear independent components estimation. arXiv preprint arXiv:1410.8516, 2014.
- Preserving causal constraints in counterfactual explanations for machine learning classifiers. arXiv preprint arXiv:1912.03277, 2019.
- Counterfactual explanations without opening the black box: Automated decisions and the gdpr, 2018.
- Data augmentation via latent space interpolation for image classification. In 2018 24th International Conference on Pattern Recognition (ICPR), pages 728–733. IEEE, 2018.
- Interpretable counterfactual explanations guided by prototypes. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 650–665. Springer, 2021.
- On quantitative evaluations of counterfactuals. arXiv preprint arXiv:2111.00177, 2021b.
- Generating interpretable counterfactual explanations by implicit minimisation of epistemic and aleatoric uncertainties. In International Conference on Artificial Intelligence and Statistics, pages 1756–1764. PMLR, 2021.
- Will Taylor-Melanson (1 paper)
- Zahra Sadeghi (8 papers)
- Stan Matwin (51 papers)