Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Data Science Principles for Interpretable and Explainable AI (2405.10552v2)

Published 17 May 2024 in stat.ML and cs.LG

Abstract: Society's capacity for algorithmic problem-solving has never been greater. Artificial Intelligence is now applied across more domains than ever, a consequence of powerful abstractions, abundant data, and accessible software. As capabilities have expanded, so have risks, with models often deployed without fully understanding their potential impacts. Interpretable and interactive machine learning aims to make complex models more transparent and controllable, enhancing user agency. This review synthesizes key principles from the growing literature in this field. We first introduce precise vocabulary for discussing interpretability, like the distinction between glass box and explainable models. We then explore connections to classical statistical and design principles, like parsimony and the gulfs of interaction. Basic explainability techniques -- including learned embeddings, integrated gradients, and concept bottlenecks -- are illustrated with a simple case study. We also review criteria for objectively evaluating interpretability approaches. Throughout, we underscore the importance of considering audience goals when designing interactive data-driven systems. Finally, we outline open challenges and discuss the potential role of data science in addressing them. Code to reproduce all examples can be found at https://go.wisc.edu/3k1ewe.

Interpretable and Interactive Machine Learning: Bridging AI and Human Understanding

Introduction

AI has evolved into a vital tool used across various domains, from predicting astrophysical phenomena to NLP. Despite the substantial success brought by this technological advancement, there are concerns about its risks, particularly when deployed without a full understanding of its potential impacts. To mitigate these risks, research focuses on making AI models more transparent and controllable through interpretable and interactive machine learning. This article explores key principles from the growing field of interpretable AI, summarizing techniques and their practical applications.

Understanding Interpretable AI

Key Vocabulary

The concept of algorithmic interpretability can be broken down into interpretable models and explainability techniques.

  • Interpretable models, often referred to as "glass box" models, are designed to be transparent and modifiable. Examples include sparse linear models and decision trees. They allow predictions to be traced back to a few understandable components.
  • Explainability techniques aim to enhance our understanding of "black box" models by providing tools to examine their outputs. Techniques like partial dependence plots fall into this category. They enable the analysis of a model's predictions based on various feature values without opening up the "black box."

Key principles behind interpretable models include:

  • Parsimony: Ensuring the model has a small number of relevant components for easier interpretation.
  • Simulatability: The ease with which one can manually derive predictions from the model.
  • Sensitivity: The model’s robustness to small changes in input data.
  • Interaction Gulfs: The time it takes for a user to understand and act upon the model's output.

Methods for Interpretability

Several methods are used to make models more interpretable. The review highlighted two main categories: Directly interpretable models and eXplainable AI (XAI) techniques.

  1. Directly Interpretable Models:
    • Sparse Logistic Regression: This model is linear and ensures that many features have zero coefficients, leaving only a few important features for interpretation.
    • Decision Trees: These models use a series of yes/no questions to partition data, making the decision process easy to follow.
  2. Explainable AI (XAI) Techniques:
    • Embedding Visualization: Techniques like Principal Component Analysis (PCA) are used to visualize high-dimensional embeddings in lower dimensions.
    • Integrated Gradients: This technique assigns importance scores to input features by computing the gradient of the prediction output with respect to these features.
    • Concept Bottleneck Models (CBMs): These models compress features into concept-level annotations, making them interpretable and allowing counterfactual reasoning.

Practical Applications: A Case Study

Simulation Design

To illustrate these concepts, the paper uses a hypothetical longitudinal paper on microbiome data, tracking 500 participants over two months to predict health outcomes. This setup mimics realistic, complex datasets needing interpretable machine learning for proper analysis and decision-making.

Applying Models

  1. Direct Methods:
    • Sparse logistic regression and decision trees were applied to both raw and summary statistics of the data. Models trained on summarized features (like linear trends) outperformed those using raw features, showcasing that a well-chosen representation can enhance both model performance and interpretability.
    • Sparse logistic regression, due to its transparency and stability, proved to be a better fit for interpreting the microbiome data compared to decision trees that became overly complex.
  2. XAI Techniques:
    • Transformers: A transformer model trained on the same data showed promise, achieving comparable performance to the best directly interpretable models without engineered features.
    • Using techniques like Integrated Gradients and Concept Bottleneck Models further helped decode these complex models, providing insights into how predictions were made.

Evaluation of Interpretability

Evaluating interpretability methods involves both dataset benchmarks and user studies. For instance:

  • Ablation Studies: Removing features deemed important by a method to see if model performance drops can validate feature importance.
  • Synthetic Benchmarks: Creating datasets with known generative rules helps test if interpretability techniques can accurately trace these rules.
  • User Studies: Involving human participants to assess how interpretability tools aid in making better predictions or understanding model functionalities.

Implications and Future Directions

The shift towards multimodal models and foundation models requires next-level interpretability techniques that offer broad-spectrum understanding across data sources. Additionally, regulatory pressures and ethical considerations push for more reliable and interpretable AI applications, especially in sensitive areas like healthcare and law. Future research is expected to focus on making AI models not only accurate but also robust, transparent, and interactive, fostering a collaborative environment where human intuition and machine intelligence enhance each other.

Interactive interpretability can democratize AI, similarly to how data visualization has enabled a more data-literate society. This synergy between interpretability and interactivity can lead to AI systems being scrutinized, understood, and trusted across different sectors and user groups.

By fostering an AI ecosystem that values both performance and transparency, we can ensure safer deployment of AI technologies, making them powerful allies in decision-making processes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (76)
  1. Sanity checks for saliency maps. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, 9525–9536. Curran Associates Inc., Red Hook, NY, USA.
  2. Design principles for visual communication. Commun. ACM, 54(4): 60–69.
  3. Does the whole exceed its parts? the effect of ai explanations on complementary team performance. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, CHI ’21. ACM.
  4. Bengio Y (2009). Learning Deep Architectures for AI. NOW.
  5. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8): 1798–1828.
  6. Impossibility theorems for feature attribution. Proceedings of the National Academy of Sciences, 121(2).
  7. On the opportunities and risks of foundation models.
  8. What makes a visualization memorable? IEEE Transactions on Visualization and Computer Graphics, 19(12): 2306–2315.
  9. Proxy tasks and subjective measures can be misleading in evaluating explainable ai systems. In: Proceedings of the 25th International Conference on Intelligent User Interfaces, IUI ’20. ACM.
  10. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’15, 1721–1730. Association for Computing Machinery, New York, NY, USA.
  11. Red Teaming Deep Neural Networks with Feature Synthesis Tools. In: Advances in Neural Information Processing Systems (A Oh, T Naumann, A Globerson, K Saenko, M Hardt, S Levine, eds.), volume 36, 80470–80516. Curran Associates, Inc.
  12. History Aware Multimodal Transformer for Vision-and-Language Navigation. In: Advances in Neural Information Processing Systems (M Ranzato, A Beygelzimer, Y Dauphin, PS Liang, JW Vaughan, eds.), volume 34, 5834–5847. Curran Associates, Inc.
  13. Cleveland WS (1993). Visualizing Data. Hobart Press.
  14. Visualizing and measuring the geometry of BERT. Curran Associates Inc., Red Hook, NY, USA.
  15. What I Cannot Predict, I Do Not Understand: A Human-Centered Evaluation Framework for Explainability Methods. In: Advances in Neural Information Processing Systems (S Koyejo, S Mohamed, A Agarwal, D Belgrave, K Cho, A Oh, eds.), volume 35, 2832–2845. Curran Associates, Inc.
  16. Explaining Latent Representations with a Corpus of Examples. In: Advances in Neural Information Processing Systems (M Ranzato, A Beygelzimer, Y Dauphin, PS Liang, JW Vaughan, eds.), volume 34, 12154–12166. Curran Associates, Inc.
  17. BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (J Burstein, C Doran, T Solorio, eds.), 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota.
  18. Cooperative learning for multiview analysis. Proceedings of the National Academy of Sciences, 119(38).
  19. Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research, 11(19): 625–660.
  20. Plenoxels: Radiance fields without neural networks. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE.
  21. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1).
  22. Friedman JH (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5).
  23. Towards automatic concept-based explanations. Curran Associates Inc., Red Hook, NY, USA.
  24. Lactobacillus-deficient cervicovaginal bacterial communities are associated with increased hiv acquisition in young south african women. Immunity, 46(1): 29–37.
  25. Badnets: Evaluating backdooring attacks on deep neural networks. IEEE Access, 7: 47230–47244.
  26. The tree ensemble layer: differentiability meets conditional computation. In: Proceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org.
  27. Heer J (2019). Agency plus automation: Designing artificial intelligence into interactive systems. Proceedings of the National Academy of Sciences, 116(6): 1844–1850.
  28. FunnyBirds: A synthetic vision dataset for a part-based analysis of explainable AI methods. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE.
  29. Hewitt J, Manning CD (2019). A structural probe for finding syntax in word representations. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (J Burstein, C Doran, T Solorio, eds.), 4129–4138. Association for Computational Linguistics, Minneapolis, Minnesota.
  30. Holmes S (2017). Statistical proof? the problem of irreproducibility. Bulletin of the American Mathematical Society, 55(1): 31–55.
  31. A Benchmark for Interpretability Methods in Deep Neural Networks. In: Advances in Neural Information Processing Systems (H Wallach, H Larochelle, A Beygelzimer, Fd Alché-Buc, E Fox, R Garnett, eds.), volume 32. Curran Associates, Inc.
  32. Huber PJ (1964). Robust estimation of a location parameter. The Annals of Mathematical Statistics, 35(1): 73–101.
  33. Direct manipulation interfaces. Human–Computer Interaction, 1(4): 311–338.
  34. How Can I Explain This to You? An Empirical Study of Deep Neural Network Explanation Methods. In: Advances in Neural Information Processing Systems, volume 33, 4211–4222. Curran Associates, Inc.
  35. Kim B (2022). Beyond Interpretability: Developing a Language to Shape Our Relationships with AI.
  36. Koh PW, Liang P (2017). Understanding black-box predictions via influence functions. In: Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, 1885–1894. JMLR.org.
  37. Concept bottleneck models.
  38. The dynamics of the human infant gut microbiome in development and in progression toward type 1 diabetes. Cell Host and Microbe, 17(2): 260–273.
  39. Krishnan M (2019). Against interpretability: a critical examination of the interpretability problem in machine learning. Philosophy and Technology, 33(3): 487–502.
  40. Kundaliya D (2023). Computing - incisive media: Google ai chatbot bard gives wrong answer in its first demo. Computing. Nom - OpenAI; Copyright - Copyright Newstex Feb 9, 2023; Dernière mise à jour - 2023-11-30.
  41. Lee JS (2021). Transformers: a Primer.
  42. Lipton ZC (2018). The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue, 16(3): 31–57.
  43. Loh W (2014). Fifty years of classification and regression trees. International Statistical Review, 82(3): 329–348.
  44. Lundberg SM, Lee SI (2017). A unified approach to interpreting model predictions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, 4768–4777. Curran Associates Inc., Red Hook, NY, USA.
  45. OpenHEXAI: An open-source framework for human-centered evaluation of explainable machine learning.
  46. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44): 22071–22080.
  47. Multimodal deep learning. In: Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML’11, 689–696. Omnipress, Madison, WI, USA.
  48. Nguyen LH, Holmes S (2019). Ten quick tips for effective dimensionality reduction. PLOS Computational Biology, 15(6): e1006907.
  49. Public attitudes value interpretability but prioritize accuracy in artificial intelligence. Nature Communications, 13(1).
  50. Oppermann M, Munzner T (2022). Vizsnippets: Compressing visualization bundles into representative previews for browsing visualization collections. IEEE Transactions on Visualization and Computer Graphics, 28(1): 747–757.
  51. Do Vision Transformers See Like Convolutional Neural Networks? In: Advances in Neural Information Processing Systems (M Ranzato, A Beygelzimer, Y Dauphin, PS Liang, JW Vaughan, eds.), volume 34, 12116–12128. Curran Associates, Inc.
  52. Transfusion: understanding transfer learning for medical imaging. Curran Associates Inc., Red Hook, NY, USA.
  53. Recht B (2023). All models are wrong, but some are dangerous.
  54. “Why should I trust you?”: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, 1135–1144. Association for Computing Machinery, New York, NY, USA.
  55. Rudin C (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5): 206–215.
  56. Sankaran K, Holmes SP (2023). Generative models: An interdisciplinary perspective. Annual Review of Statistics and Its Application, 10(1): 325–352.
  57. Design study methodology: Reflections from the trenches and the stacks. IEEE Transactions on Visualization and Computer Graphics, 18(12): 2431–2440.
  58. Grad-cam: Visual explanations from deep networks via gradient-based localization. In: 2017 IEEE International Conference on Computer Vision (ICCV), 618–626.
  59. Deep inside convolutional networks: Visualising image classification models and saliency maps. In: Workshop at International Conference on Learning Representations.
  60. Visualizing the impact of feature attribution baselines. Distill, 5(1).
  61. Axiomatic attribution for deep networks. In: Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, 3319–3328. JMLR.org.
  62. Teh YW (2019). On Statistical Thinking in Deep Learning A Blog Post. IMS Medallion Lecture.
  63. Tufte ER (2001). The visual display of quantitative information. Graphics Press, Cheshire, CT, 2 edition.
  64. Tukey JW (1959). A survey of sampling from contaminated distributions. (33): 57.
  65. gsignal: Signal processing.
  66. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, 6000–6010. Curran Associates Inc., Red Hook, NY, USA.
  67. A fine-grained interpretability evaluation benchmark for neural NLP. In: Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL). Association for Computational Linguistics, Stroudsburg, PA, USA.
  68. Williamson B, Feng J (2020). Efficient nonparametric statistical inference on population feature importance using shapley values. In: Proceedings of the 37th International Conference on Machine Learning (HD III, A Singh, eds.), volume 119 of Proceedings of Machine Learning Research, 10282–10291. PMLR.
  69. Visualizing dataflow graphs of deep learning models in tensorflow. IEEE Transactions on Visualization and Computer Graphics, 24: 1–12.
  70. Polyjuice: Generating counterfactuals for explaining, evaluating, and improving models. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics.
  71. An explanation of in-context learning as implicit bayesian inference. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
  72. Understanding neural networks through deep visualization. ArXiv, abs/1506.06579.
  73. Post-hoc concept bottleneck models. In: The Eleventh International Conference on Learning Representations.
  74. Zeiler MD, Fergus R (2013). Visualizing and understanding convolutional networks. ArXiv, abs/1311.2901.
  75. Interpretable classification models for recidivism prediction. Journal of the Royal Statistical Society Series A: Statistics in Society, 180(3): 689–722.
  76. Segment Everything Everywhere All at Once. In: Advances in Neural Information Processing Systems (A Oh, T Naumann, A Globerson, K Saenko, M Hardt, S Levine, eds.), volume 36, 19769–19782. Curran Associates, Inc.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Kris Sankaran (25 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com