Papers
Topics
Authors
Recent
Search
2000 character limit reached

On Explaining with Attention Matrices

Published 24 Oct 2024 in cs.CL and cs.AI | (2410.18541v1)

Abstract: This paper explores the much discussed, possible explanatory link between attention weights (AW) in transformer models and predicted output. Contrary to intuition and early research on attention, more recent prior research has provided formal arguments and empirical evidence that AW are not explanatorily relevant. We show that the formal arguments are incorrect. We introduce and effectively compute efficient attention, which isolates the effective components of attention matrices in tasks and models in which AW play an explanatory role. We show that efficient attention has a causal role (provides minimally necessary and sufficient conditions) for predicting model output in NLP tasks requiring contextual information, and we show, contrary to [7], that efficient attention matrices are probability distributions and are effectively calculable. Thus, they should play an important part in the explanation of attention based model behavior. We offer empirical experiments in support of our method illustrating various properties of efficient attention with various metrics on four datasets.

Authors (2)
Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Wasserstein generative adversarial networks. In International conference on machine learning, pages 214–223. PMLR, 2017.
  2. Marta: Leveraging human rationales for explainable text classification. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 5868–5876, 2021.
  3. Counterfactual models for fair and adequate explanations. Machine Learning and Knowledge Extraction, 4(2):316–349, 2022.
  4. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
  5. Is attention explanation? an introduction to the debate. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3889–3900, 2022.
  6. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  7. On identifiability in transformers. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=BJg1f6EFDB.
  8. Learning phrase representations using rnn encoder-decoder for statistical machine translation. In Conference on Empirical Methods in Natural Language Processing (EMNLP 2014), 2014.
  9. G. Chrysostomou and N. Aletras. Improving the faithfulness of attention-based explanations with task-specific information for text classification. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 477–488, 2021.
  10. What does bert look at? an analysis of bert’s attention. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 276–286, 2019.
  11. A. de Santana Correia and E. L. Colombini. Attention, please! a survey of neural attention models in deep learning. Artificial Intelligence Review, 55(8):6037–6124, 2022.
  12. Eraser: A benchmark to evaluate rationalized nlp models. arXiv preprint arXiv:1911.03429, 2019.
  13. Differential equations of applied mathematics. John Wiley and sons, 1966.
  14. Craft: Concept recursive activation factorization for explainability. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2711–2721, 2023.
  15. Measuring the mixing of contextual information in the transformer. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 8698–8714, 2022.
  16. Explaining how transformers use context to build predictions. arXiv preprint arXiv:2305.12535, 2023.
  17. Attention in natural language processing. IEEE transactions on neural networks and learning systems, 32(10):4291–4308, 2020.
  18. Causal abstractions of neural networks. Advances in Neural Information Processing Systems, 34:9574–9586, 2021.
  19. Transformer feed-forward layers are key-value memories. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5484–5495, 2021.
  20. Interpreting recurrent and attention-based neural models: a case study on natural language inference. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4952–4957, 2018.
  21. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  22. Cost-effective interactive attention learning with neural attention processes. In Proceedings of the 37th International Conference on Machine Learning, pages 4228–4238, 2020.
  23. A. Jacovi and Y. Goldberg. Towards faithfully interpretable nlp systems: How should we define and evaluate faithfulness? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4198–4205, 2020.
  24. S. Jain and B. C. Wallace. Attention is not explanation. In Proceedings of NAACL-HLT, pages 3543–3556, 2019.
  25. Are fairness metric scores enough to assess discrimination biases in machine learning? arXiv preprint arXiv:2306.05307, 2023.
  26. Incorporating residual and normalization layers into analysis of masked language models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2021.
  27. Learning the parts of objects by non-negative matrix factorization. nature, 401(6755):788–791, 1999.
  28. D. Lewis. Counterfactuals. Basil Blackwell, Oxford, 1973.
  29. D. Lewis. A subjectivist’s guide to objective chance. In IFS: Conditionals, Belief, Decision, Chance and Time, pages 267–297. Springer, 1981.
  30. L. E. Loeb. Causal theories and causal overdetermination. The Journal of Philosophy, 71(15):525–544, 1974.
  31. Learning word vectors for sentiment analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies, pages 142–150, 2011.
  32. Locating and editing factual associations in gpt. Advances in Neural Information Processing Systems, 35:17359–17372, 2022.
  33. Zoom in: An introduction to circuits. Distill, 5(3):e00024–001, 2020.
  34. V. M. Panaretos and Y. Zemel. Statistical aspects of wasserstein distances. Annual review of statistics and its application, 6:405–431, 2019.
  35. Attention is turing-complete. Journal of Machine Learning Research, 22(75):1–35, 2021.
  36. Toward transparent ai: A survey on interpreting the inner structures of deep neural networks. In 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), pages 464–483. IEEE, 2023.
  37. C. Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence, 1(5):206–215, 2019.
  38. S. Serrano and N. A. Smith. Is attention interpretable? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2931–2951, 2019.
  39. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing, pages 1631–1642, 2013.
  40. Undecidable theories, volume 13. Elsevier, 1953.
  41. M. Tutek and J. Šnajder. Staying true to your word:(how) can attention become explanation? In Proceedings of the 5th Workshop on Representation Learning for NLP, pages 131–142, 2020.
  42. Attention interpretability across nlp tasks. arXiv preprint arXiv:1909.11218, 2019.
  43. J. Vig and Y. Belinkov. Analyzing the structure of attention in a transformer language model. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 63–76, 2019.
  44. Larger language models do in-context learning differently. arXiv preprint arXiv:2303.03846, 2023.
  45. S. Wiegreffe and Y. Pinter. Attention is not not explanation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 11–20, 2019.
  46. K. Yin and G. Neubig. Interpreting language models with contrastive explanations. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 184–198, 2022.
  47. Character-level convolutional networks for text classification. Advances in neural information processing systems, 28, 2015.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.