Measuring Information in Text Explanations
Abstract: Text-based explanation is a particularly promising approach in explainable AI, but the evaluation of text explanations is method-dependent. We argue that placing the explanations on an information-theoretic framework could unify the evaluations of two popular text explanation methods: rationale and natural language explanations (NLE). This framework considers the post-hoc text pipeline as a series of communication channels, which we refer to as ``explanation channels''. We quantify the information flow through these channels, thereby facilitating the assessment of explanation characteristics. We set up tools for quantifying two information scores: relevance and informativeness. We illustrate what our proposed information scores measure by comparing them against some traditional evaluation metrics. Our information-theoretic scores reveal some unique observations about the underlying mechanisms of two representative text explanations. For example, the NLEs trade-off slightly between transmitting the input-related information and the target-related information, whereas the rationales do not exhibit such a trade-off mechanism. Our work contributes to the ongoing efforts in establishing rigorous and standardized evaluation criteria in the rapidly evolving field of explainable AI.
- Peter Achinstein. 1983. The Nature of Explanation. Oxford University Press.
- OpenXAI: Towards a Transparent Evaluation of Model Explanations. arXiv preprint arXiv:2206.11104.
- Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning. In ACL, pages 7319–7328, Online. Association for Computational Linguistics.
- Optuna: A next-generation hyperparameter optimization framework. In KDD.
- Towards better understanding of gradient-based attribution methods for deep neural networks. In ICLR.
- Mutual information neural estimation. In International Conference on Machine Learning, pages 531–540. PMLR.
- Impossibility Theorems for Feature Attribution.
- A large annotated corpus for learning natural language inference. In EMNLP, pages 632–642, Lisbon, Portugal. Association for Computational Linguistics.
- Human-Centered Evaluation of Explanations. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorial Abstracts, pages 26–32, Seattle, United States. Association for Computational Linguistics.
- Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901.
- e-SNLI: Natural Language Inference with Natural Language Explanations. In Advances in Neural Information Processing Systems 31, pages 9539–9549. Curran Associates, Inc.
- Evaluating and Characterizing Human Rationales.
- Frame: Evaluating simulatability metrics for free-text rationales. EMNLP BlackboxNLP Workshop.
- REV: Information-Theoretic Evaluation of Free-Text Rationales. arXiv preprint arXiv:2210.04982.
- CLUB: A contrastive log-ratio upper bound of mutual information. In International conference on machine learning, pages 1779–1788. PMLR.
- Co:here. 2023. Cohere Embedding API Reference.
- What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties. In ACL, pages 2126–2136, Melbourne, Australia. Association for Computational Linguistics.
- Entropy, relative entropy and mutual information. Elements of information theory, 2(1):12–13.
- Beyond Normal: On the Evaluation of Mutual Information Estimators. In NeurIPS.
- ERASER: A Benchmark to Evaluate Rationalized NLP Models. In ACL, pages 4443–4458, Online. Association for Computational Linguistics.
- Faith and Fate: Limits of Transformers on Compositionality.
- Kawin Ethayarajh. 2019. How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings. In EMNLP-IJCNLP, pages 55–65, Hong Kong, China. Association for Computational Linguistics.
- Understanding Dataset Difficulty with 𝒱𝒱\mathcal{{V}}caligraphic_V-Usable Information. In Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 5988–6008. PMLR.
- GPTScore: Evaluate as You Desire. arXiv preprint.
- Which Explanation Should I Choose? A Function Approximation Perspective to Characterizing Post hoc Explanations. In NeurIPS. arXiv.
- Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial Explanations of Their Behavior in Natural Language? In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4351–4367, Online. Association for Computational Linguistics.
- Carl G. Hempel and Paul Oppenheim. 1948. Studies in the Logic of Explanation. Philosophy of Science, 15(2):135–175.
- Conditional probing: measuring usable information beyond a baseline. In EMNLP, pages 1626–1639, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Yifan Hou and Mrinmaya Sachan. 2021. Bird’s eye: Probing for linguistic graph structures with a simple information-theoretic approach. In ACL, pages 1844–1859, Online. Association for Computational Linguistics.
- Alexander Jung and Pedro H. J. Nardelli. 2020. An information-theoretic approach to personalized explainable machine learning. IEEE Signal Processing Letters, 27:825–829.
- Nonparametric von mises estimators for entropies, divergences and mutual informations. NeurIPS, 28.
- Estimating mutual information. Physical review E, 69(6):066138. Publisher: APS.
- Causal Reasoning and Large Language Models: Opening a New Frontier for Causality.
- BLIP-2: bootstrapping language-image pre-training with frozen image encoders and large language models. In ICML.
- Holistic Evaluation of Language Models.
- Synthetic Benchmarks for Scientific Research in Explainable Machine Learning. In arXiv:2106.12543.
- RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:1907.11692.
- Tanya Lombrozo. 2012. Explanation and Abductive Inference. In Keith J. Holyoak and Robert G. Morrison, editors, The Oxford Handbook of Thinking and Reasoning, 1 edition, pages 260–276. Oxford University Press.
- Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity. In ACL, pages 8086–8098, Dublin, Ireland. Association for Computational Linguistics.
- Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In NeurIPS, volume 30.
- David McAllester and Karl Stratos. 2020. Formal limitations on the measurement of mutual information. In International Conference on Artificial Intelligence and Statistics, pages 875–884. PMLR.
- Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference. In ACL, pages 3428–3448, Florence, Italy. Association for Computational Linguistics.
- Analyzing wrap-up effects through an information-theoretic lens. In ACL, pages 20–28, Dublin, Ireland. Association for Computational Linguistics.
- Revisiting the Uniform Information Density hypothesis. In EMNLP, pages 963–980, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Transactions on Information Theory, 56(11):5847–5861. Publisher: IEEE.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748.
- Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155.
- A Differential Entropy Estimator for Training Neural Networks. In arXiv:2202.06618.
- Tiago Pimentel and Ryan Cotterell. 2021. A Bayesian framework for information-theoretic probing. In EMNLP, pages 2869–2887, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Information-Theoretic Probing for Linguistic Structure. In ACL, pages 4609–4622, Online. Association for Computational Linguistics.
- On Variational Bounds of Mutual Information. In Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 5171–5180. PMLR.
- Evaluating explanations: How much do explanations from the teacher aid students? Transactions of the Association for Computational Linguistics, 10:359–375.
- Learning transferable visual models from natural language supervision.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
- Zero-shot text-to-image generation. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 8821–8831. PMLR.
- " Why should I trust you?" Explaining the predictions of any classifier. In KDD, pages 1135–1144.
- Jiaming Song and Stefano Ermon. 2020. Understanding the limitations of variational mutual information estimators. In International Conference on Learning Representations.
- Llama: Open and efficient foundation language models.
- Marcos Treviso and André F. T. Martins. 2020. The explanation game: Towards prediction explainability through sparse communication. In Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pages 107–118, Online. Association for Computational Linguistics.
- Elena Voita and Ivan Titov. 2020. Information-Theoretic Probing with Minimum Description Length. In EMNLP, pages 183–196, Online. Association for Computational Linguistics.
- Ben Wang and Aran Komatsuzaki. 2021. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax.
- Improving Neural Language Generation with Spectrum Control. In ICLR.
- Reframing Human-AI Collaboration for Generating Free-Text Explanations. In NAACL, pages 632–658, Seattle, United States. Association for Computational Linguistics.
- Measuring Association Between Labels and Free-Text Rationales. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 10266–10284, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- A theory of usable information under computational constraints. ICLR.
- Kayo Yin and Graham Neubig. 2022. Interpreting Language Models with Contrastive Explanations. In EMNLP, pages 184–198, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- BERTScore: Evaluating Text Generation with BERT.
- Quantifying the Task-Specific Information in Text-Based Classifications. arXiv preprint arXiv:2110.08931.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.