Papers
Topics
Authors
Recent
2000 character limit reached

The Role of Syntactic Span Preferences in Post-Hoc Explanation Disagreement (2403.19424v1)

Published 28 Mar 2024 in cs.CL and cs.AI

Abstract: Post-hoc explanation methods are an important tool for increasing model transparency for users. Unfortunately, the currently used methods for attributing token importance often yield diverging patterns. In this work, we study potential sources of disagreement across methods from a linguistic perspective. We find that different methods systematically select different classes of words and that methods that agree most with other methods and with humans display similar linguistic preferences. Token-level differences between methods are smoothed out if we compare them on the syntactic span level. We also find higher agreement across methods by estimating the most important spans dynamically instead of relying on a fixed subset of size $k$. We systematically investigate the interaction between $k$ and spans and propose an improved configuration for selecting important tokens.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Contextual string embeddings for sequence labeling. In COLING 2018, 27th International Conference on Computational Linguistics, pages 1638–1649.
  2. Faithfulness tests for natural language explanations. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 283–294, Toronto, Canada. Association for Computational Linguistics.
  3. A diagnostic study of explainability techniques for text classification. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3256–3274.
  4. Benchmarking post-hoc interpretability approaches for transformer-based misogyny detection. In Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP. Association for Computational Linguistics.
  5. ferret: a framework for benchmarking explainers on transformers. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pages 256–266, Dubrovnik, Croatia. Association for Computational Linguistics.
  6. From intermediate representations to explanations: Exploring hierarchical structures in nlp. In ECAI 2023, pages 157–164. IOS Press.
  7. “will you find these shortcuts?” a protocol for evaluating the faithfulness of input salience methods for text classification. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 976–991, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  8. The legal argument reasoning task in civil procedure. In Proceedings of the Natural Legal Language Processing Workshop 2022, pages 194–207.
  9. Can i trust the explainer? verifying post-hoc explanatory methods. arXiv preprint arXiv:1910.02065.
  10. Evaluating and characterizing human rationales. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9294–9307, Online. Association for Computational Linguistics.
  11. Danqi Chen and Christopher D Manning. 2014. A fast and accurate dependency parser using neural networks. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 740–750.
  12. Interpretation of black box nlp models: A survey. arXiv preprint arXiv:2203.17081.
  13. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  14. Alon Jacovi and Yoav Goldberg. 2020. Towards faithfully interpretable nlp systems: How should we define and evaluate faithfulness? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4198–4205.
  15. Sarthak Jain and Byron C Wallace. 2019. Attention is not explanation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3543–3556.
  16. How can i choose an explainer? an application-grounded evaluation of post-hoc explanations. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pages 805–815.
  17. Jaap Jumelet and Willem Zuidema. 2023. Feature interactions reveal linguistic structure in language models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 8697–8712, Toronto, Canada. Association for Computational Linguistics.
  18. Dynamic top-k estimation consolidates disagreement between feature attribution methods. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6190–6197, Singapore. Association for Computational Linguistics.
  19. Multilingual constituency parsing with self-attention and pre-training. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3499–3505, Florence, Italy. Association for Computational Linguistics.
  20. The disagreement problem in explainable machine learning: A practitioner’s perspective. arXiv preprint arXiv:2202.01602.
  21. Many faces of feature importance: Comparing built-in and post-hoc feature importance in text classification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 486–495.
  22. Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30.
  23. Post-hoc interpretability for neural nlp: A survey. ACM Computing Surveys, 55(8):1–42.
  24. Rethinking self-attention: Towards interpretability in neural parsing. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 731–742.
  25. A song of (dis) agreement: Evaluating the evaluation of explainable artificial intelligence in natural language processing. In HHAI2022: Augmenting Human Intellect, pages 60–78. IOS Press.
  26. AGREE: a feature attribution aggregation framework to address explainer disagreements with alignment metrics. In Proceedings of the Workshops at the 31st International Conference on Case-Based Reasoning (ICCBR-WS 2023), pages 184–199. CEUR.
  27. Karl Popper. 2005. The logic of scientific discovery. Routledge.
  28. Evaluating explanations: How much do explanations from the teacher aid students? Transactions of the Association for Computational Linguistics, 10:359–375.
  29. Towards interpreting bert for reading comprehension based qa. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3236–3242.
  30. Toward transparent ai: A survey on interpreting the inner structures of deep neural networks. In 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), pages 464–483. IEEE.
  31. “why should I trust you?”: Explaining the predictions of any classifier. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, pages 97–101, San Diego, California. Association for Computational Linguistics.
  32. Why don’t xai techniques agree? characterizing the disagreements between post-hoc explanations of defect predictions. In 2022 IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 444–448. IEEE.
  33. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
  34. Learning important features through propagating activation differences. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, page 3145–3153. JMLR.org.
  35. Integrated directional gradients: Feature interaction attribution for neural nlp models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 865–878.
  36. Deep inside convolutional networks: visualising image classification models and saliency maps. In Proceedings of the International Conference on Learning Representations (ICLR). ICLR.
  37. Automatic counterfactual augmentation for robust text classification based on word-group search. arXiv preprint arXiv:2307.01214.
  38. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR.
  39. Named entity recognition and dependency parsing for better concept extraction in summary obfuscation detection. Expert Systems with Applications, 217:119579.
  40. Parsing all: Syntax and semantics, dependencies and spans. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4438–4449, Online. Association for Computational Linguistics.
  41. e-snli: Natural language inference with natural language explanations. GitHub Repository without PID/islrn: https://github.com/OanaMariaCamburu/e-SNLI.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 6 likes about this paper.