Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Investigating the Impact of Model Instability on Explanations and Uncertainty (2402.13006v2)

Published 20 Feb 2024 in cs.LG and cs.CL

Abstract: Explainable AI methods facilitate the understanding of model behaviour, yet, small, imperceptible perturbations to inputs can vastly distort explanations. As these explanations are typically evaluated holistically, before model deployment, it is difficult to assess when a particular explanation is trustworthy. Some studies have tried to create confidence estimators for explanations, but none have investigated an existing link between uncertainty and explanation quality. We artificially simulate epistemic uncertainty in text input by introducing noise at inference time. In this large-scale empirical study, we insert different levels of noise perturbations and measure the effect on the output of pre-trained LLMs and different uncertainty metrics. Realistic perturbations have minimal effect on performance and explanations, yet masking has a drastic effect. We find that high uncertainty doesn't necessarily imply low explanation plausibility; the correlation between the two metrics can be moderately positive when noise is exposed during the training process. This suggests that noise-augmented models may be better at identifying salient tokens when uncertain. Furthermore, when predictive and epistemic uncertainty measures are over-confident, the robustness of a saliency map to perturbation can indicate model stability issues. Integrated Gradients shows the overall greatest robustness to perturbation, while still showing model-specific patterns in performance; however, this phenomenon is limited to smaller Transformer-based LLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Sanity Checks for Saliency Maps.
  2. Underspecification Presents Challenges for Credibility in Modern Machine Learning. Technical report.
  3. David Alvarez-Melis and Tommi S. Jaakkola. 2018. On the Robustness of Interpretability Methods.
  4. Evaluating the faithfulness of saliency maps in explaining deep learning models using realistic perturbations. Information Processing and Management, 60(2).
  5. A Diagnostic Study of Explainability Techniques for Text Classification.
  6. Factuality Challenges in the Era of Large Language Models.
  7. A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity. CoRR, abs/2302.04023.
  8. Please Take Over: XAI, Delegation of Authority, and Domain Knowledge. Technical report.
  9. Katherine E Brown and Douglas A Talbert. 2022. Using Explainable AI to Measure Feature Contribution to Uncertainty. Technical report.
  10. How Much Can I Trust You? – Quantifying Uncertainties in Explaining Neural Networks.
  11. Lucy R Chai. 2018. Uncertainty Estimation in Bayesian Neural Networks And Links to Interpretability.
  12. What Makes A Good Explanation?: A Harmonized View Of Properties Of Explanations.
  13. I Think i Get Your Point, AI! The Illusion of Explanatory Depth in Explainable AI. In International Conference on Intelligent User Interfaces, Proceedings IUI, pages 307–317. Association for Computing Machinery.
  14. ELECTRA: PRE-TRAINING TEXT ENCODERS AS DISCRIMINATORS RATHER THAN GENERATORS.
  15. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
  16. HotFlip: White-Box Adversarial Examples for Text Classification. Technical report.
  17. Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems. Technical report.
  18. Christiane Fellbaum. 1998. WordNet: An Electronic Lexical Database. Bradford Books.
  19. Pathologies of Neural Models Make Interpretations Difficult. pages 3719–3728.
  20. Yarin Gal and Zoubin Ghahramani. 2015. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning.
  21. BERT & Family Eat Word Salad: Experiments with Text Understanding.
  22. Does BERT Learn as Humans Perceive? Understanding Linguistic Styles through Lexica.
  23. Quantus: An Explainable AI Toolkit for Responsible Evaluation of Neural Network Explanations and Beyond. Journal of Machine Learning Research, 24:1–11.
  24. Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment.
  25. Rethinking AI Explainability and Plausibility.
  26. Logic traps in evaluating attribution scores. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5911–5922, Dublin, Ireland. Association for Computational Linguistics.
  27. Alex Kendall and Yarin Gal. 2016. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? Technical report.
  28. Pytorch captum. https://github.com/pytorch/captum.
  29. Himabindu Lakkaraju and Osbert Bastani. 2020. "How do I fool you?": Manipulating User Trust via Misleading Black Box Explanations.
  30. Tune: A research platform for distributed model selection and training. arXiv preprint arXiv:1807.05118.
  31. RoBERTa: A Robustly Optimized BERT Pretraining Approach.
  32. Edward Loper and Steven Bird. 2002. Nltk: The natural language toolkit.
  33. Evaluating the Faithfulness of Importance Measures in NLP by Recursively Masking Allegedly Important Tokens and Retraining.
  34. But Are You Sure? An Uncertainty-Aware Perspective on Explainable AI. Technical report.
  35. HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection.
  36. Milad Moradi and Matthias Samwald. 2021. Evaluating the robustness of neural language models to input perturbations. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1558–1570, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  37. SemEval-2013 Task 2: Sentiment Analysis in Twitter. Technical report.
  38. PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification. Technical report.
  39. Understanding Softmax Confidence and Uncertainty.
  40. Language Models are Unsupervised Multitask Learners. Technical report.
  41. Beyond Accuracy: Behavioral Testing of NLP Models with CheckList. Technical report.
  42. LANGUAGE MODELLING WITH PIXELS.
  43. Transparency and trust in artificial intelligence systems.
  44. Not Just a Black Box: Learning Important Features Through Propagating Activation Differences.
  45. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps.
  46. UnNatural Language Inference. pages 7329–7346.
  47. Reliable Post hoc Explanations: Modeling Uncertainty in Explainability.
  48. SmoothGrad: removing noise by adding noise.
  49. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, Seattle, Washington, USA. Association for Computational Linguistics.
  50. Striving for Simplicity: The All Convolutional Net.
  51. Axiomatic Attribution for Deep Networks.
  52. LLaMA: Open and Efficient Foundation Language Models.
  53. Evaluating XAI: A comparison of rule-based and example-based explanations. Artificial Intelligence, 291.
  54. SemAttack: Natural Textual Attacks via Different Semantic Spaces.
  55. Explaining Predictive Uncertainty with Information Theoretic Shapley Values. 37th Conference on Neural Information Processing Systems (NeurIPS 2023).
  56. Ethical and social risks of harm from Language Models.
  57. Sarah Wiegreffe and Ana Marasović. 2021. Teach Me to Explain: A Review of Datasets for Explainable Natural Language Processing.
  58. Exploring the Impact of Information Entropy Change in Learning Systems. Accepted to ICLR 2024.
  59. OPT: Open Pre-trained Transformer Language Models.
  60. Interpreting the Robustness of Neural NLP Models to Textual Perturbations. Technical report.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Sara Vera Marjanović (4 papers)
  2. Isabelle Augenstein (131 papers)
  3. Christina Lioma (66 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com