Papers
Topics
Authors
Recent
2000 character limit reached

COCKATIEL: COntinuous Concept ranKed ATtribution with Interpretable ELements for explaining neural net classifiers on NLP tasks (2305.06754v2)

Published 11 May 2023 in cs.CL and stat.ML

Abstract: Transformer architectures are complex and their use in NLP, while it has engendered many successes, makes their interpretability or explainability challenging. Recent debates have shown that attention maps and attribution methods are unreliable (Pruthi et al., 2019; Brunner et al., 2019). In this paper, we present some of their limitations and introduce COCKATIEL, which successfully addresses some of them. COCKATIEL is a novel, post-hoc, concept-based, model-agnostic XAI technique that generates meaningful explanations from the last layer of a neural net model trained on an NLP classification task by using Non-Negative Matrix Factorization (NMF) to discover the concepts the model leverages to make predictions and by exploiting a Sensitivity Analysis to estimate accurately the importance of each of these concepts for the model. It does so without compromising the accuracy of the underlying model or requiring a new one to be trained. We conduct experiments in single and multi-aspect sentiment analysis tasks and we show COCKATIEL's superior ability to discover concepts that align with humans' on Transformer models without any supervision, we objectively verify the faithfulness of its explanations through fidelity metrics, and we showcase its ability to provide meaningful explanations in two different datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Sanity checks for saliency maps. Advances in neural information processing systems, 31.
  2. Diego Antognini and Boi Faltings. 2021. Rationalization through concepts. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 761–775, Online. Association for Computational Linguistics.
  3. Interpretable neural predictions with differentiable binary variables. arXiv preprint arXiv:1905.08160.
  4. Diane Bouchacourt and Ludovic Denoyer. 2019. Educe: Explaining model decisions through unsupervised concepts extraction. arXiv preprint arXiv:1905.11852.
  5. On identifiability in transformers. arXiv preprint arXiv:1908.04211.
  6. A game theoretic approach to class-wise selective rationalization. Advances in neural information processing systems, 32.
  7. Invariant rationalization. In International Conference on Machine Learning, pages 1448–1458. PMLR.
  8. Study of the sensitivity of coupled reaction systems to uncertainties in rate coefficients. i theory. The Journal of chemical physics, 59(8):3873–3878.
  9. Bias in bios: A case study of semantic representation bias in a high-stakes setting. In proceedings of the Conference on Fairness, Accountability, and Transparency, pages 120–128.
  10. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  11. Look at the variance! efficient black-box explanations with sobol-based sensitivity analysis. Advances in Neural Information Processing Systems, 34.
  12. What i cannot predict, i do not understand: A human-centered evaluation framework for explainability methods. arXiv preprint arXiv:2112.04417.
  13. Craft: Concept recursive activation factorization for explainability. arXiv preprint arXiv:2211.10154.
  14. Mathieu Gerber. 2015. On integration methods based on scrambled nets of arbitrary size. Journal of Complexity, 31(6):798–816.
  15. Interpretation of neural networks is fragile. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 3681–3688.
  16. Towards automatic concept-based explanations. Advances in Neural Information Processing Systems, 32.
  17. Peter Hase and Mohit Bansal. 2020. Evaluating explainable ai: Which algorithmic explanations help users predict model behavior? arXiv preprint arXiv:2005.01831.
  18. Bertrand Iooss and Paul Lemaître. 2015. A review on global sensitivity analysis methods. Uncertainty management in simulation-optimization of complex systems, pages 101–122.
  19. Sarthak Jain and Byron C Wallace. 2019. Attention is not explanation. arXiv preprint arXiv:1902.10186.
  20. Learning to faithfully rationalize by construction. arXiv preprint arXiv:2005.00115.
  21. Asymptotic normality and efficiency of two sobol index estimators. ESAIM: Probability and Statistics, 18:342–364.
  22. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR.
  23. Mauritz Kop. 2021. Eu artificial intelligence act: The european approach to ai. Stanford-Vienna Transatlantic Technology Law Forum, Transatlantic Antitrust ….
  24. Daniel D Lee and H Sebastian Seung. 1999. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788–791.
  25. Rationalizing neural predictions. arXiv preprint arXiv:1606.04155.
  26. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
  27. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692.
  28. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142–150, Portland, Oregon, USA. Association for Computational Linguistics.
  29. Cc-news-en: A large english news corpus. In Proceedings of the 29th ACM International Conference on Information amp; Knowledge Management, CIKM ’20, page 3077–3084, New York, NY, USA. Association for Computing Machinery.
  30. Calculations of sobol indices for the gaussian process metamodel. Reliability Engineering & System Safety, 94(3):742–751.
  31. Learning attitudes and attributes from multi-aspect reviews. In 2012 IEEE 12th International Conference on Data Mining, pages 1020–1025. IEEE.
  32. Dong Nguyen. 2018. Comparing automatic and human evaluation of local explanations for text classification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1069–1078.
  33. Art B Owen. 2013. Better estimation of small sobol’sensitivity indices. ACM Transactions on Modeling and Computer Simulation (TOMACS), 23(2):1–17.
  34. An information bottleneck approach for controlling conciseness in rationale extraction. arXiv preprint arXiv:2005.00652.
  35. Causal inference in statistics: A primer. John Wiley & Sons.
  36. Elements of causal inference: foundations and learning algorithms. MIT press.
  37. Learning to deceive with attention-based explanations. arXiv preprint arXiv:1909.07913.
  38. A comprehensive comparison of total-order estimators for global sensitivity analysis. International Journal for Uncertainty Quantification, 12(2).
  39. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  40. Model-agnostic interpretability of machine learning. arXiv preprint arXiv:1606.05386.
  41. Variance based sensitivity analysis of model output. design and estimator for the total sensitivity index. Computer physics communications, 181(2):259–270.
  42. Sofia Serrano and Noah A Smith. 2019. Is attention interpretable? arXiv preprint arXiv:1906.03731.
  43. Hua Shen and Ting-Hao Huang. 2020. How useful are the machine-generated interpretations to general users? a human evaluation on guessing the incorrectly predicted labels. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, volume 8, pages 168–172.
  44. Deep inside convolutional networks: Visualising image classification models and saliency maps. In In Workshop at International Conference on Learning Representations. Citeseer.
  45. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825.
  46. Ilya M Sobol. 1993. Sensitivity analysis for non-linear mathematical models. Mathematical modelling and computational experiment, 1:407–414.
  47. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR.
  48. Random balance designs for the estimation of first order global sensitivity indices. Reliability Engineering & System Safety, 91(6):717–727.
  49. Trieu H Trinh and Quoc V Le. 2018. A simple method for commonsense reasoning. arXiv preprint arXiv:1806.02847.
  50. Attention is all you need. Advances in neural information processing systems, 30.
  51. Gradient-based analysis of nlp models is manipulable. arXiv preprint arXiv:2010.05419.
  52. Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3):229–256.
  53. Rethinking cooperative rationalization: Introspective extraction and complement control. arXiv preprint arXiv:1910.13294.
  54. Matthew D Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European conference on computer vision, pages 818–833. Springer.
  55. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 11682–11690.
  56. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In arXiv preprint arXiv:1506.06724.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.