Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exploring the Trade-off Between Model Performance and Explanation Plausibility of Text Classifiers Using Human Rationales (2404.03098v1)

Published 3 Apr 2024 in cs.CL, cs.AI, and cs.LG

Abstract: Saliency post-hoc explainability methods are important tools for understanding increasingly complex NLP models. While these methods can reflect the model's reasoning, they may not align with human intuition, making the explanations not plausible. In this work, we present a methodology for incorporating rationales, which are text annotations explaining human decisions, into text classification models. This incorporation enhances the plausibility of post-hoc explanations while preserving their faithfulness. Our approach is agnostic to model architectures and explainability methods. We introduce the rationales during model training by augmenting the standard cross-entropy loss with a novel loss function inspired by contrastive learning. By leveraging a multi-objective optimization algorithm, we explore the trade-off between the two loss functions and generate a Pareto-optimal frontier of models that balance performance and plausibility. Through extensive experiments involving diverse models, datasets, and explainability methods, we demonstrate that our approach significantly enhances the quality of model explanations without causing substantial (sometimes negligible) degradation in the original model's performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. MARTA: Leveraging Human Rationales for Explainable Text Classification. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 5868–5876, Virtual. AAAI Press. Number: 7.
  2. A Diagnostic Study of Explainability Techniques for Text Classification. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3256–3274, Online. Association for Computational Linguistics.
  3. Deriving Machine Attention from Human Rationales. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1903–1913, Brussels, Belgium. Association for Computational Linguistics.
  4. SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter. In Proceedings of the 13th International Workshop on Semantic Evaluation, pages 54–63, Minneapolis, Minnesota, USA. Association for Computational Linguistics.
  5. Interpretable Neural Predictions with Differentiable Binary Variables. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2963–2977, Florence, Italy. Association for Computational Linguistics.
  6. Weakly Supervised Multi-task Learning for Concept-based Explainability. In Proceedings of the First Workshop on Weakly Supervised Learning (WeaSuL), Virtual.
  7. Impossibility theorems for feature attribution. Proceedings of the National Academy of Sciences, 121(2).
  8. e-SNLI: Natural Language Inference with Natural Language Explanations. In Advances in Neural Information Processing Systems, volume 31, Palais des Congrès de Montréal, Montréal, Canada. Curran Associates, Inc.
  9. What to Learn, and How: Toward Effective Learning from Rationales. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1075–1088, Dublin, Ireland. Association for Computational Linguistics.
  10. UNIREX: A Unified Learning Framework for Language Model Rationale Extraction. In Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models, pages 51–67, virtual+Dublin. Association for Computational Linguistics.
  11. Transformer Interpretability Beyond Attention Visualization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 782–791.
  12. Hanjie Chen and Yangfeng Ji. 2020. Learning Variational Word Masks to Improve the Interpretability of Neural Text Classifiers. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4236–4251, Online. Association for Computational Linguistics.
  13. A Simple Framework for Contrastive Learning of Visual Representations. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 1597–1607. PMLR. ISSN: 2640-3498.
  14. Jared L. Cohon. 1978. Multiobjective Programming and Planning, 1 edition, volume 140 of Mathematics in Science and Engineering. Academic Press.
  15. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  16. ERASER: A Benchmark to Evaluate Rationalized NLP Models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4443–4458, Online. Association for Computational Linguistics.
  17. Crowdsourcing Natural Language Data at Scale: A Hands-On Tutorial. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorials, pages 25–30, Online. Association for Computational Linguistics.
  18. Generalizing Backpropagation for Gradient-Based Interpretability. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11979–11995, Toronto, Canada. Association for Computational Linguistics.
  19. Learning Credible Deep Neural Networks with Rationale Regularization. In 2019 IEEE International Conference on Data Mining (ICDM), pages 150–159, Beijing, China. IEEE.
  20. Scalable Interpretability via Polynomials. ArXiv:2205.14108 [cs].
  21. Saliency Learning: Teaching the Model Where to Pay Attention. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4016–4025, Minneapolis, Minnesota. Association for Computational Linguistics.
  22. The non-linear nature of the cost of comprehensibility. Journal of Big Data, 9(1).
  23. Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial Explanations of Their Behavior in Natural Language? In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4351–4367, Online. Association for Computational Linguistics.
  24. Exploring Distantly-Labeled Rationales in Neural Network Models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5571–5582, Online. Association for Computational Linguistics.
  25. Alon Jacovi and Yoav Goldberg. 2021. Aligning Faithful Interpretations with their Social Attribution. Transactions of the Association for Computational Linguistics, 9:294–310.
  26. Learning to Faithfully Rationalize by Construction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4459–4473, Online. Association for Computational Linguistics.
  27. Simultaneous Generation of Accurate and Interpretable Neural Network Classifiers. In Yaochu Jin, editor, Multi-Objective Machine Learning, 1 edition, volume 16 of Studies in Computational Intelligence, pages 291–312. Springer, Berlin, Heidelberg.
  28. Supervised Contrastive Learning. In Advances in Neural Information Processing Systems 33 (NeurIPS 2020), volume 33, pages 18661–18673. Curran Associates, Inc.
  29. Sawan Kumar and Partha Talukdar. 2020. NILE : Natural Language Inference with Faithful Natural Language Explanations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8730–8742, Online. Association for Computational Linguistics.
  30. Unintended Bias Detection and Mitigation in Misogynous Memes. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2719–2733, St. Julian’s, Malta. Association for Computational Linguistics.
  31. FiD-Ex: Improving Sequence-to-Sequence Models for Extractive Rationale Generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3712–3727, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  32. Rationalizing Neural Predictions. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 107–117, Austin, Texas. Association for Computational Linguistics.
  33. Mining of Massive Datasets, 3 edition.
  34. Frederick Liu and Besim Avci. 2019. Incorporating Priors with Feature Attribution on Text Classification. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6274–6283, Florence, Italy. Association for Computational Linguistics.
  35. Towards Explainable NLP: A Generative Explanation Framework for Text Classification. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5570–5581, Florence, Italy. Association for Computational Linguistics.
  36. Improve Interpretability of Neural Networks via Sparse Contrastive Coding. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 460–470, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  37. Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30 (NIPS 2017), volume 30. Curran Associates, Inc.
  38. Tweet Sentiment Extraction.
  39. Introduction to information retrieval. Cambridge University Press, New York. OCLC: ocn190786122.
  40. HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 14867–14875, Virtual. AAAI Press. Number: 17.
  41. Kaisa Miettinen. 1998. Nonlinear Multiobjective Optimization, 1 edition, volume 12 of International Series in Operations Research & Management Science. Springer New York, NY.
  42. Embedding Human Knowledge into Deep Neural Network via Attention Map. In Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP, volume 5, pages 626–636. SciTePress.
  43. Quantifying Explainability in NLP and Analyzing Algorithms for Performance-Explainability Tradeoff.
  44. An Information Bottleneck Approach for Controlling Conciseness in Rationale Extraction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1938–1952, Online. Association for Computational Linguistics.
  45. Regularizing Black-box Models for Improved Interpretability.
  46. Weakly- and Semi-supervised Evidence Extraction. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3965–3970, Online. Association for Computational Linguistics.
  47. Neural Basis Models for Interpretability. ArXiv:2205.14120 [cs].
  48. An extension of the non-inferior set estimation algorithm for many objectives. European Journal of Operational Research, 284(1):53–66.
  49. Explain Yourself! Leveraging Language Models for Commonsense Reasoning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4932–4942, Florence, Italy. Association for Computational Linguistics.
  50. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pages 1135–1144, New York, NY, USA. Association for Computing Machinery.
  51. Interpretations are Useful: Penalizing Explanations to Align Neural Networks with Prior Knowledge. In Proceedings of the 37th International Conference on Machine Learning, volume 119, pages 8116–8126. PMLR. ISSN: 2640-3498.
  52. Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pages 2662–2670, Melbourne, Australia. AAAI Press.
  53. Cynthia Rudin. 2019. Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. Nature machine intelligence, 1(5):206–215.
  54. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. ArXiv:1910.01108 [cs].
  55. Improving Interpretability via Explicit Word Interaction Graph Layer. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 13528–13537, Washington DC, USA. AAAI Press. Number: 11.
  56. A Computational Approach to Understanding Empathy Expressed in Text-Based Mental Health Support. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5263–5276, Online. Association for Computational Linguistics.
  57. Manali Sharma and Mustafa Bilgic. 2018. Learning with rationales for document classification. Machine Learning, 107(5):797–824.
  58. GradMask: Reduce Overfitting by Regularizing Saliency. ArXiv:1904.07478 [cs, eess].
  59. Do Human Rationales Improve Machine Explanations? In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 56–62, Florence, Italy. Association for Computational Linguistics.
  60. Rationalizing Text Matching: Learning Sparse Alignments via Optimal Transport. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5609–5626, Online. Association for Computational Linguistics.
  61. Erico Tjoa and Cuntai Guan. 2022. Quantifying Explainability of Saliency Methods in Deep Neural Networks with a Synthetic Dataset. ArXiv:2009.02899 [cs].
  62. Well-Read Students Learn Better: On the Importance of Pre-training Compact Models. ArXiv:1908.08962 [cs].
  63. Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 6000–6010, Long Beach, California, USA. Curran Associates Inc.
  64. Sarah Wiegreffe and Ana Marasovic. 2021. Teach Me to Explain: A Review of Datasets for Explainable Natural Language Processing. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, 1.
  65. Using “Annotator Rationales” to Improve Machine Learning for Text Categorization. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pages 260–267, Rochester, New York. Association for Computational Linguistics.
  66. Machine Learning with Annotator Rationales to Reduce Annotation Cost. In Proceedings of the NIPS 2008 Workshop on Cost Sensitive Learning, pages 260–267.
  67. Explain and Predict, and then Predict Again. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pages 418–426, Virtual Event Israel. Association for Computing Machinery.
  68. ExSum: From Local Explanations to Model Understanding. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5359–5378, Seattle, United States. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Lucas E. Resck (4 papers)
  2. Marcos M. Raimundo (3 papers)
  3. Jorge Poco (10 papers)
Citations (1)