Navigating the Structured What-If Spaces: Counterfactual Generation via Structured Diffusion (2312.13616v1)
Abstract: Generating counterfactual explanations is one of the most effective approaches for uncovering the inner workings of black-box neural network models and building user trust. While remarkable strides have been made in generative modeling using diffusion models in domains like vision, their utility in generating counterfactual explanations in structured modalities remains unexplored. In this paper, we introduce Structured Counterfactual Diffuser or SCD, the first plug-and-play framework leveraging diffusion for generating counterfactual explanations in structured data. SCD learns the underlying data distribution via a diffusion model which is then guided at test time to generate counterfactuals for any arbitrary black-box model, input, and desired prediction. Our experiments show that our counterfactuals not only exhibit high plausibility compared to the existing state-of-the-art but also show significantly better proximity and diversity.
- F. Doshi-Velez and B. Kim, “Towards a rigorous science of interpretable machine learning,” arXiv: Machine Learning, 2017. [Online]. Available: https://api.semanticscholar.org/CorpusID:11319376
- OpenAI, “Gpt-4 technical report,” ArXiv, vol. abs/2303.08774, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:257532815
- A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, “Hierarchical text-conditional image generation with clip latents,” ArXiv, vol. abs/2204.06125, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:248097655
- M. T. Ribeiro, S. Singh, and C. Guestrin, “Model-agnostic interpretability of machine learning,” arXiv preprint arXiv:1606.05386, 2016.
- S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” Advances in neural information processing systems, vol. 30, 2017.
- S. Wachter, B. Mittelstadt, and C. Russell, “Counterfactual explanations without opening the black box: Automated decisions and the gdpr,” Harv. JL & Tech., vol. 31, p. 841, 2017.
- R. K. Mothilal, A. Sharma, and C. Tan, “Explaining machine learning classifiers through diverse counterfactual explanations,” in Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 2020, pp. 607–617.
- A.-H. Karimi, G. Barthe, B. Balle, and I. Valera, “Model-agnostic counterfactual explanations for consequential decisions,” ArXiv, vol. abs/1905.11190, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:166227893
- W. Yang, J. Li, C. Xiong, and S. C. Hoi, “Mace: An efficient model-agnostic framework for counterfactual explanation,” arXiv preprint arXiv:2205.15540, 2022.
- A. Ross, A. Marasović, and M. E. Peters, “Explaining nlp models via minimal contrastive editing (mice),” arXiv preprint arXiv:2012.13985, 2020.
- N. Madaan, I. Padhi, N. Panwar, and D. Saha, “Generate your counterfactuals: Towards controlled counterfactual generation for text,” in Proceedings of the AAAI Conference on Artificial Intelligence, no. 15, 2021, pp. 13 516–13 524.
- J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in neural information processing systems, vol. 33, pp. 6840–6851, 2020.
- M. Augustin, V. Boreiko, F. Croce, and M. Hein, “Diffusion visual counterfactual explanations,” Advances in Neural Information Processing Systems, vol. 35, pp. 364–377, 2022.
- G. Jeanneret, L. Simon, and F. Jurie, “Diffusion models for counterfactual explanations,” in Asian Conference on Computer Vision, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:247779169
- P. Sanchez and S. A. Tsaftaris, “Diffusion causal models for counterfactual estimation,” in CLEaR, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:247011291
- “Diffusion-based visual counterfactual explanations – towards systematic quantitative evaluation,” 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:260866076
- X. Li, J. Thickstun, I. Gulrajani, P. S. Liang, and T. B. Hashimoto, “Diffusion-lm improves controllable text generation,” Advances in Neural Information Processing Systems, vol. 35, pp. 4328–4343, 2022.
- K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep inside convolutional networks: Visualising image classification models and saliency maps,” arXiv preprint arXiv:1312.6034, 2013.
- X. Han, B. C. Wallace, and Y. Tsvetkov, “Explaining black box predictions and unveiling data artifacts through influence functions,” arXiv preprint arXiv:2005.06676, 2020.
- L. A. Hendricks, R. Hu, T. Darrell, and Z. Akata, “Grounding visual explanations,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 264–279.
- Y. Goyal, Z. Wu, J. Ernst, D. Batra, D. Parikh, and S. Lee, “Counterfactual visual explanations,” in International Conference on Machine Learning. PMLR, 2019, pp. 2376–2384.
- A. Van Looveren and J. Klaise, “Interpretable counterfactual explanations guided by prototypes,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 2021, pp. 650–665.
- J. Li, W. Monroe, and D. Jurafsky, “Understanding neural networks through representation erasure,” arXiv preprint arXiv:1612.08220, 2016.
- S. Feng, E. Wallace, A. Grissom II, M. Iyyer, P. Rodriguez, and J. Boyd-Graber, “Pathologies of neural models make interpretations difficult,” arXiv preprint arXiv:1804.07781, 2018.
- M. T. Ribeiro, T. Wu, C. Guestrin, and S. Singh, “Beyond accuracy: Behavioral testing of NLP models with CheckList,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online: Association for Computational Linguistics, 2020, pp. 4902–4912. [Online]. Available: https://www.aclweb.org/anthology/2020.acl-main.442
- J. Ebrahimi, A. Rao, D. Lowd, and D. Dou, “Hotflip: White-box adversarial examples for text classification,” arXiv preprint arXiv:1712.06751, 2017.
- M. T. Ribeiro, S. Singh, and C. Guestrin, “Semantically equivalent adversarial rules for debugging NLP models,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne, Australia: Association for Computational Linguistics, 2018, pp. 856–865. [Online]. Available: https://www.aclweb.org/anthology/P18-1079
- M. Iyyer, J. Wieting, K. Gimpel, and L. Zettlemoyer, “Adversarial example generation with syntactically controlled paraphrase networks,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). New Orleans, Louisiana: Association for Computational Linguistics, Jun. 2018, pp. 1875–1885. [Online]. Available: https://aclanthology.org/N18-1170
- R. Jia and P. Liang, “Adversarial examples for evaluating reading comprehension systems,” arXiv preprint arXiv:1707.07328, 2017.
- M. T. Ribeiro, S. Singh, and C. Guestrin, “Anchors: High-precision model-agnostic explanations,” in Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, S. A. McIlraith and K. Q. Weinberger, Eds. AAAI Press, 2018, pp. 1527–1535. [Online]. Available: https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16982
- I. Higgins, L. Matthey, A. Pal, C. P. Burgess, X. Glorot, M. M. Botvinick, S. Mohamed, and A. Lerchner, “beta-vae: Learning basic visual concepts with a constrained variational framework,” in International Conference on Learning Representations, 2016. [Online]. Available: https://api.semanticscholar.org/CorpusID:46798026
- A.-H. Karimi, G. Barthe, B. Balle, and I. Valera, “Model-agnostic counterfactual explanations for consequential decisions,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2020, pp. 895–905.
- R. Guidotti, A. Monreale, F. Giannotti, D. Pedreschi, S. Ruggieri, and F. Turini, “Factual and counterfactual explanations for black box decision making,” IEEE Intelligent Systems, vol. 34, pp. 14–23, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:210931542
- R. Poyiadzi, K. Sokol, R. Santos-Rodriguez, T. De Bie, and P. Flach, “Feasible and actionable counterfactual explanations,” 2020.
- A. Dhurandhar, T. Pedapati, A. Balakrishnan, P.-Y. Chen, K. Shanmugam, and R. Puri, “Model agnostic contrastive explanations for structured data,” ArXiv, vol. abs/1906.00117, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:173990728
- A. Jacovi, S. Swayamdipta, S. Ravfogel, Y. Elazar, Y. Choi, and Y. Goldberg, “Contrastive explanations for model interpretability,” in Conference on Empirical Methods in Natural Language Processing, 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:232092617
- T. Wu, M. T. Ribeiro, J. Heer, and D. S. Weld, “Polyjuice: Automated, general-purpose counterfactual generation,” arXiv preprint arXiv:2101.00288, 2021.
- N. Madaan, D. Saha, and S. Bedathur, “Counterfactual sentence generation with plug-and-play perturbation,” in 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). IEEE, 2023, pp. 306–315.
- V. Boreiko, M. Augustin, F. Croce, P. Berens, and M. Hein, “Sparse visual counterfactual explanations in image space,” ArXiv, vol. abs/2205.07972, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:248834482
- P. Howard, G. Singer, V. Lal, Y. Choi, and S. Swayamdipta, “Neurocounterfactuals: Beyond minimal-edit counterfactuals for richer data augmentation,” ArXiv, vol. abs/2210.12365, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:253098636
- Z. Xu, H. Lamba, Q. Ai, J. Tetreault, and A. Jaimes, “Counterfactual editing for search result explanation,” ArXiv, vol. abs/2301.10389, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:256231426
- A. Frank, “Uci machine learning repository,” http://archive. ics. uci. edu/ml, 2010.
- H. Zhu, “Predicting earning potential using the adult dataset,” Retrieved December, vol. 5, p. 2016, 2016.
- M. S., R. P., and C. P., “Bank Marketing,” UCI Machine Learning Repository, 2012, DOI: https://doi.org/10.24432/C5K306.
- R. K. Pace and R. Barry, “Sparse spatial autoregressions,” Statistics & Probability Letters, vol. 33, no. 3, pp. 291–297, 1997.
- Nishtha Madaan (12 papers)
- Srikanta Bedathur (41 papers)