Unraveling the Dilemma of AI Errors: Exploring the Effectiveness of Human and Machine Explanations for Large Language Models (2404.07725v1)
Abstract: The field of eXplainable artificial intelligence (XAI) has produced a plethora of methods (e.g., saliency-maps) to gain insight into AI models, and has exploded with the rise of deep learning (DL). However, human-participant studies question the efficacy of these methods, particularly when the AI output is wrong. In this study, we collected and analyzed 156 human-generated text and saliency-based explanations collected in a question-answering task (N=40) and compared them empirically to state-of-the-art XAI explanations (integrated gradients, conservative LRP, and ChatGPT) in a human-participant study (N=136). Our findings show that participants found human saliency maps to be more helpful in explaining AI answers than machine saliency maps, but performance negatively correlated with trust in the AI model and explanations. This finding hints at the dilemma of AI errors in explanation, where helpful explanations can lead to lower task performance when they support wrong AI predictions.
- Trends and Trajectories for Explainable, Accountable and Intelligible Systems: An HCI Research Agenda. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–18. https://doi.org/10.1145/3173574.3174156
- Sanity Checks for Saliency Maps. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2018/file/294a8ed24b1ad22ec2e7efea049b8737-Paper.pdf
- Open AI. 2022. Introducing ChatGPT. https://openai.com/blog/chatgpt
- XAI for Transformers: Better Explanations Through Conservative Propagation. In Proceedings of the 39th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (Eds.). PMLR, 435–451. https://proceedings.mlr.press/v162/ali22a.html
- Evaluating Saliency Map Explanations for Convolutional Neural Networks: A User Study. In Proceedings of the 25th International Conference on Intelligent User Interfaces (Cagliari, Italy) (IUI ’20). Association for Computing Machinery, New York, NY, USA, 275–285. https://doi.org/10.1145/3377325.3377519
- From Human Explanation to Model Interpretability: A Framework Based on Weight of Evidence. Proceedings of the AAAI Conference on Human Computation and Crowdsourcing 9, 1 (Oct. 2021), 35–47. https://doi.org/10.1609/hcomp.v9i1.18938
- Towards Better Understanding of Gradient-Based Attribution Methods for Deep Neural Networks. In International Conference on Learning Representations. https://openreview.net/forum?id=Sy21R9JAW
- Explainable Agents and Robots: Results From a Systematic Literature Review. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems (Montreal QC, Canada) (AAMAS ’19). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 1078–1088. https://dl.acm.org/doi/10.5555/3306127.3331806
- Explaining Recurrent Neural Network Predictions in Sentiment Analysis. In Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. Association for Computational Linguistics, Copenhagen, Denmark, 159–168. https://doi.org/10.18653/v1/W17-5221
- Exploratory Not Explanatory: Counterfactual Analysis of Saliency Maps for Deep Reinforcement Learning. In International Conference on Learning Representations. https://openreview.net/forum?id=rkl3m1BFDB
- Neural Machine Translation by Jointly Learning to Align and Translate. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1409.0473
- Being Trustworthy is Not Enough: How Untrustworthy Artificial Intelligence (AI) Can Deceive the End-Users and Gain Their Trust. Proc. ACM Hum.-Comput. Interact. 7, CSCW1, Article 27 (April 2023), 17 pages. https://doi.org/10.1145/3579460
- Does the Whole Exceed Its Parts? The Effect of AI Explanations on Complementary Team Performance. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 81, 16 pages. https://doi.org/10.1145/3411764.3445717
- Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges Toward Responsible AI. Information Fusion 58 (2020), 82–115. https://doi.org/10.1016/j.inffus.2019.12.012
- Jasmijn Bastings and Katja Filippova. 2020. The Elephant in the Interpretability Room: Why Use Attention As Explanation When We Have Saliency Methods?. In Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP. Association for Computational Linguistics, Online, 149–155. https://doi.org/10.18653/v1/2020.blackboxnlp-1.14
- Virginia Braun and Victoria Clarke. 2019. Reflecting on Reflexive Thematic Analysis. Qualitative Research in Sport, Exercise and Health 11, 4 (2019), 589–597. https://doi.org/10.1080/2159676X.2019.1628806
- Thematic Analysis. Springer Singapore, Singapore, 843–860. https://doi.org/10.1007/978-981-10-5251-4_103
- Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 1877–1901. https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
- Proxy Tasks and Subjective Measures Can Be Misleading in Evaluating Explainable AI Systems. In Proceedings of the 25th International Conference on Intelligent User Interfaces (Cagliari, Italy) (IUI ’20). Association for Computing Machinery, New York, NY, USA, 454–464. https://doi.org/10.1145/3377325.3377498
- To Trust or to Think: Cognitive Forcing Functions Can Reduce Overreliance on AI in AI-Assisted Decision-Making. Proc. ACM Hum.-Comput. Interact. 5, CSCW1, Article 188 (April 2021), 21 pages. https://doi.org/10.1145/3449287
- Captum. 2020. Interpreting BERT Models (Part 1). https://captum.ai/tutorials/Bert_SQUAD_Interpret#Interpreting-BERT-Models-(Part-1)
- Dark Patterns of Explainability, Transparency, and User Control for Intelligent Systems.. In IUI workshops, Vol. 2327. https://ceur-ws.org/Vol-2327/IUI19WS-ExSS2019-7.pdf
- Michael Chromik and Martin Schuessler. 2020. A Taxonomy for Human Subject Evaluation of Black-Box Explanations in XAI. In Proceedings of the Workshop on Explainable Smart Systems for Algorithmic Transparency in Emerging Technologies co-located with 25th International Conference on Intelligent User Interfaces (IUI 2020), Cagliari, Italy, March 17, 2020 (CEUR Workshop Proceedings, Vol. 2582), Alison Smith-Renner, Styliani Kleanthous, Brian Y. Lim, Tsvi Kuflik, Simone Stumpf, Jahna Otterbacher, Advait Sarkar, Casey Dugan, and Avital Shulner Tal (Eds.). CEUR-WS.org. http://ceur-ws.org/Vol-2582/paper9.pdf
- A Historical Perspective of Explainable Artificial Intelligence. WIREs Data Mining and Knowledge Discovery 11, 1 (2021), e1391. https://doi.org/10.1002/widm.1391
- A Survey of the State of Explainable AI for Natural Language Processing. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, Suzhou, China, 447–459. https://aclanthology.org/2020.aacl-main.46
- BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423
- Shuoyang Ding and Philipp Koehn. 2021. Evaluating Saliency Methods for Neural Language Models. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 5034–5052. https://doi.org/10.18653/v1/2021.naacl-main.399
- Visualizing and Understanding Neural Machine Translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada, 1150–1159. https://doi.org/10.18653/v1/P17-1106
- Finale Doshi-Velez and Been Kim. 2017. Towards A Rigorous Science of Interpretable Machine Learning. https://doi.org/10.48550/ARXIV.1702.08608
- A Tale of Two Explanations: Enhancing Human Trust by Explaining Robot Behavior. Science Robotics 4, 37 (2019), eaay4663. https://doi.org/10.1126/scirobotics.aay4663
- Upol Ehsan and Mark O. Riedl. 2021. Explainability Pitfalls: Beyond Dark Patterns in Explainable AI. https://doi.org/10.48550/ARXIV.2109.12480
- Shi Feng and Jordan Boyd-Graber. 2019. What Can AI Do for Me? Evaluating Machine Learning Interpretations in Cooperative Play. In Proceedings of the 24th International Conference on Intelligent User Interfaces (Marina del Ray, California) (IUI ’19). Association for Computing Machinery, New York, NY, USA, 229–239. https://doi.org/10.1145/3301275.3302265
- Raymond Fok and Daniel S. Weld. 2023. In Search of Verifiability: Explanations Rarely Enable Complementary Performance in AI-Advised Decision Making. ArXiv abs/2305.07722 (2023). https://doi.org/10.48550/arXiv.2305.07722
- The False Hope of Current Approaches to Explainable Artificial Intelligence in Health Care. The Lancet Digital Health 3, 11 (01 11 2021), e745–e750. https://doi.org/10.1016/S2589-7500(21)00208-9
- A Survey of Methods for Explaining Black Box Models. ACM Comput. Surv. 51, 5, Article 93 (Aug. 2018), 42 pages. https://doi.org/10.1145/3236009
- DARPA’s Explainable AI (XAI) Program: A Retrospective. Applied AI Letters 2, 4 (2021), e61. https://doi.org/10.1002/ail2.61 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/ail2.61
- Joseph Y. Halpern and Judea Pearl. 2005a. Causes and Explanations: A Structural-Model Approach. Part I: Causes. The British Journal for the Philosophy of Science 56, 4 (12 2005), 843–887. https://doi.org/10.1093/bjps/axi147
- Joseph Y. Halpern and Judea Pearl. 2005b. Causes and Explanations: A Structural-Model Approach. Part II: Explanations. The British Journal for the Philosophy of Science 56, 4 (12 2005), 889–911. https://doi.org/10.1093/bjps/axi148
- Metrics for Explainable AI: Challenges and Prospects. https://doi.org/10.48550/ARXIV.1812.04608
- HuggingFace. 2023. Large, uncased BERT with whole-word masking, finetuned on SQuAD. https://huggingface.co/bert-large-uncased-whole-word-masking-finetuned-squad
- How Machine-Learning Recommendations Influence Clinician Treatment Selections: The Example of Antidepressant Selection. Translational Psychiatry 11, 1 (04 Feb. 2021), 108. https://doi.org/10.1038/s41398-021-01224-x
- Sarthak Jain and Byron C. Wallace. 2019. Attention Is Not Explanation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 3543–3556. https://doi.org/10.18653/v1/N19-1357
- Ann M. Bisantz Jiun-Yin Jian and Colin G. Drury. 2000. Foundations for an Empirically Determined Scale of Trust in Automated Systems. International Journal of Cognitive Ergonomics 4, 1 (2000), 53–71. https://doi.org/10.1207/S15327566IJCE0401_04
- Interpreting Interpretability: Understanding Data Scientists’ Use of Interpretability Tools for Machine Learning. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–14. https://doi.org/10.1145/3313831.3376219
- Captum: A Unified and Generic Model Interpretability Library for PyTorch. arXiv:2009.07896
- Too Much, Too Little, or Just Right? Ways Explanations Impact End Users’ Mental Models. In 2013 IEEE Symposium on Visual Languages and Human Centric Computing. 3–10. https://doi.org/10.1109/VLHCC.2013.6645235
- An Evaluation of the Human-Interpretability of Explanation. CoRR abs/1902.00006 (2019). arXiv:1902.00006 http://arxiv.org/abs/1902.00006
- “Why is ‘Chicago’ Deceptive?” Towards Building Model-Driven Tutorials for Humans. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376873
- Deep Learning. Nature 521 (05 2015), 436–44. https://doi.org/10.1038/nature14539
- Zachary C. Lipton. 2018. The Mythos of Model Interpretability: In Machine Learning, the Concept of Interpretability is Both Important and Slippery. Queue 16, 3 (June 2018), 31–57. https://doi.org/10.1145/3236386.3241340
- Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf
- Maria Madsen and Shirley Gregor. 2000. Measuring Human-Computer Trust. In 11th Australasian Conference on Information Systems, Vol. 53. Citeseer, 6–8.
- Tim Miller. 2019. Explanation in Artificial Intelligence: Insights From the Social Sciences. Artificial Intelligence 267 (2019), 1–38. https://doi.org/10.1016/j.artint.2018.07.007
- A Multidisciplinary Survey and Framework for Design and Evaluation of Explainable AI Systems. ACM Trans. Interact. Intell. Syst. 11, 3–4, Article 24 (Sept. 2021), 45 pages. https://doi.org/10.1145/3387166
- People Perceive Algorithmic Assessments as Less Fair and Trustworthy Than Identical Human Assessments. Proc. ACM Hum.-Comput. Interact. 7, CSCW2, Article 309 (Oct. 2023), 26 pages. https://doi.org/10.1145/3610100
- Layer-Wise Relevance Propagation: An Overview. Springer International Publishing, Cham, 193–209. https://doi.org/10.1007/978-3-030-28954-6_10
- Evaluating the Impact of Human Explanation Strategies on Human-AI Visual Decision-Making. Proc. ACM Hum.-Comput. Interact. 7, CSCW1, Article 48 (April 2023), 37 pages. https://doi.org/10.1145/3579481
- Explaining Machine Learning Classifiers Through Diverse Counterfactual Explanations. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (Barcelona, Spain) (FAT* ’20). Association for Computing Machinery, New York, NY, USA, 607–617. https://doi.org/10.1145/3351095.3372850
- Fred Paas. 1992. Training Strategies for Attaining Transfer of Problem-Solving Skill in Statistics: A Cognitive-Load Approach. Journal of Educational Psychology 84 (12 1992), 429–434. https://psycnet.apa.org/doi/10.1037/0022-0663.84.4.429
- Pafla, Marvin. 2020. Researching Human-AI Collaboration Through the Design of Language-Based Query Assistance. Master’s thesis. http://hdl.handle.net/10012/16250
- Charles Pierse. 2021. Transformers Interpret. https://github.com/cdpierse/transformers-interpret
- Manipulating and Measuring Model Interpretability. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 237, 52 pages. https://doi.org/10.1145/3411764.3445315
- Language Models are Unsupervised Multitask Learners. (2018). https://openai.com/blog/better-language-models/
- SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Austin, Texas, 2383–2392. https://doi.org/10.18653/v1/D16-1264
- Top-Down Visual Saliency Guided by Captions. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 3135–3144. https://doi.org/10.1109/CVPR.2017.334
- “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD ’16). Association for Computing Machinery, New York, NY, USA, 1135–1144. https://doi.org/10.1145/2939672.2939778
- Cynthia Rudin. 2019. Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. Nature Machine Intelligence 1, 5 (01 May 2019), 206–215. https://doi.org/10.1038/s42256-019-0048-x
- Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications. Proc. IEEE 109, 3 (2021), 247–278. https://doi.org/10.1109/JPROC.2021.3060483
- Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models. CoRR abs/1708.08296 (2017). http://arxiv.org/abs/1708.08296
- A Meta-Analysis of the Utility of Explainable Artificial Intelligence in Human-AI Decision-Making. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society (Oxford, United Kingdom) (AIES ’22). Association for Computing Machinery, New York, NY, USA, 617–626. https://doi.org/10.1145/3514094.3534128
- Philipp Schmidt and Felix Biessmann. 2019. Quantifying Interpretability and Trust in Machine Learning Systems. arXiv:1901.08558
- Challenges in Explanation Quality Evaluation. arXiv:2210.07126
- Fooling LIME and SHAP: Adversarial Attacks on Post Hoc Explanation Methods. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (New York, NY, USA) (AIES ’20). Association for Computing Machinery, New York, NY, USA, 180–186. https://doi.org/10.1145/3375627.3375830
- SmoothGrad: Removing Noise by Adding Noise. CoRR abs/1706.03825 (2017). arXiv:1706.03825 http://arxiv.org/abs/1706.03825
- Kacper Sokol and Peter Flach. 2020. One Explanation Does Not Fit All. KI - Künstliche Intelligenz 34, 2 (01 06 2020), 235–250. https://doi.org/10.1007/s13218-020-00637-y
- Axiomatic Attribution for Deep Networks. In Proceedings of the 34th International Conference on Machine Learning - Volume 70 (Sydney, NSW, Australia) (ICML’17). JMLR.org, 3319–3328. https://dl.acm.org/doi/10.5555/3305890.3306024
- Evaluating XAI: A Comparison of Rule-Based and Example-Based Explanations. Artificial Intelligence 291 (2021), 103404. https://doi.org/10.1016/j.artint.2020.103404
- Attention Interpretability Across NLP Tasks. https://openreview.net/forum?id=BJe-_CNKPH
- Contextual Utility Affects the Perceived Quality of Explanations. Psychonomic Bulletin & Review 24, 5 (01 06 2017), 1436–1450. https://doi.org/10.3758/s13423-017-1275-y
- Attention is All You Need. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
- How to Evaluate Trust in AI-Assisted Decision Making? A Survey of Empirical Methodologies. Proc. ACM Hum.-Comput. Interact. 5, CSCW2, Article 327 (Oct. 2021), 39 pages. https://doi.org/10.1145/3476068
- Jesse Vig. 2019. A Multiscale Visualization of Attention in the Transformer Model. CoRR abs/1906.05714 (2019). arXiv:1906.05714 http://arxiv.org/abs/1906.05714
- Giulia Vilone and Luca Longo. 2021. Notions of Explainability and Evaluation Approaches for Explainable Artificial Intelligence. Information Fusion 76 (2021), 89–106. https://doi.org/10.1016/j.inffus.2021.05.009
- Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR. Harvard Journal of Law and Technology 31, 2 (2018), 841–887. https://doi.org/10.48550/arXiv.1711.00399
- Designing Theory-Driven User-Centric Explainable AI. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–15. https://doi.org/10.1145/3290605.3300831
- Daniel S. Weld and Gagan Bansal. 2019. The Challenge of Crafting Intelligible Intelligence. Commun. ACM 62, 6 (May 2019), 70–79. https://doi.org/10.1145/3282486
- Sarah Wiegreffe and Yuval Pinter. 2019. Attention Is Not Not Explanation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 11–20. https://doi.org/10.18653/v1/D19-1002
- HuggingFace’s Transformers: State-of-the-Art Natural Language Processing. ArXiv (2019). https://arxiv.org/abs/1910.03771
- Re-Examining Whether, Why, and How Human-AI Interaction Is Uniquely Difficult to Design. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376301
- Effect of Confidence and Explanation on Accuracy and Trust Calibration in AI-Assisted Decision Making. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (Barcelona, Spain) (FAT* ’20). Association for Computing Machinery, New York, NY, USA, 295–305. https://doi.org/10.1145/3351095.3372852
- Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics. Electronics 10, 5 (2021). https://doi.org/10.3390/electronics10050593
- Marvin Pafla (1 paper)
- Kate Larson (44 papers)
- Mark Hancock (3 papers)