Latent Concept-based Explanation of NLP Models (2404.12545v3)
Abstract: Interpreting and understanding the predictions made by deep learning models poses a formidable challenge due to their inherently opaque nature. Many previous efforts aimed at explaining these predictions rely on input features, specifically, the words within NLP models. However, such explanations are often less informative due to the discrete nature of these words and their lack of contextual verbosity. To address this limitation, we introduce the Latent Concept Attribution method (LACOAT), which generates explanations for predictions based on latent concepts. Our foundational intuition is that a word can exhibit multiple facets, contingent upon the context in which it is used. Therefore, given a word in context, the latent space derived from our training process reflects a specific facet of that word. LACOAT functions by mapping the representations of salient input words into the training latent space, allowing it to provide latent context-based explanations of the prediction.
- Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451. Association for Computational Linguistics.
- Discovering latent concepts learned in BERT. In International Conference on Learning Representations.
- Extraction of salient sentences from labelled documents. CoRR, abs/1412.6815.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT ’19, pages 4171–4186, Minneapolis, Minnesota, USA. Association for Computational Linguistics.
- On the transformation of latent space in fine-tuned NLP models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1495–1516, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Kawin Ethayarajh. 2019. How contextual are contextualized word representations? Comparing the geometry of BERT, ELMo, and GPT-2 embeddings. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 55–65, Hong Kong, China. Association for Computational Linguistics.
- Towards Automatic Concept-based Explanations. ArXiv:1902.03129 [cs, stat].
- Agglomerative clustering using the concept of mutual nearest neighbourhood. Pattern recognition, 10(2):105–112.
- COCKATIEL: COntinuous Concept ranKed ATtribution with Interpretable ELements for explaining neural net classifiers on NLP tasks. ArXiv:2305.06754 [cs, stat].
- Guided Integrated Gradients: An Adaptive Path Method for Removing Noise. ArXiv:2106.09788 [cs].
- Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV). ArXiv:1711.11279 [stat].
- Revealing the dark secrets of BERT. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4365–4374, Hong Kong, China. Association for Computational Linguistics.
- RoBERTa: A robustly optimized BERT pretraining approach. ArXiv:1907.11692.
- Towards Faithful Model Explanation in NLP: A Survey. ArXiv:2209.11326 [cs].
- Post-hoc Interpretability for Neural NLP: A Survey. ACM Computing Surveys, 55(8):1–42. ArXiv:2108.04840 [cs].
- Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313–330.
- Efficient estimation of word representations in vector space. In Proceedings of the ICLR Workshop, Scottsdale, AZ, USA.
- Textattack: A framework for adversarial attacks, data augmentation, and adversarial training in nlp. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 119–126.
- A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, ACL ’04, page 271–es, USA. Association for Computational Linguistics.
- Explaining and improving model behavior with k nearest neighbor representations.
- Visualizing and measuring the geometry of bert. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc.
- “why should I trust you?”: Explaining the predictions of any classifier. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, pages 97–101, San Diego, California. Association for Computational Linguistics.
- Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. International Journal of Computer Vision, 128(2):336–359. ArXiv:1610.02391 [cs].
- Learning important features through propagating activation differences. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, page 3145–3153. JMLR.org.
- Deep inside convolutional networks: Visualising image classification models and saliency maps. In Workshop at International Conference on Learning Representations.
- Axiomatic Attribution for Deep Networks. ArXiv:1703.01365 [cs].
- Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 3319–3328. PMLR.
- Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
- Omar Zaidan and Jason Eisner. 2008. Modeling annotators: A generative approach to learning from annotator rationales. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 31–40, Honolulu, Hawaii. Association for Computational Linguistics.
- Explaining Language Models’ Predictions with High-Impact Concepts. ArXiv:2305.02160 [cs].
- Zhixue Zhao and Nikolaos Aletras. 2023. Incorporating Attribution Importance for Improving Faithfulness Metrics. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4732–4745, Toronto, Canada. Association for Computational Linguistics.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.