Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Being Right for Whose Right Reasons? (2306.00639v2)

Published 1 Jun 2023 in cs.CL and cs.HC

Abstract: Explainability methods are used to benchmark the extent to which model predictions align with human rationales i.e., are 'right for the right reasons'. Previous work has failed to acknowledge, however, that what counts as a rationale is sometimes subjective. This paper presents what we think is a first of its kind, a collection of human rationale annotations augmented with the annotators demographic information. We cover three datasets spanning sentiment analysis and common-sense reasoning, and six demographic groups (balanced across age and ethnicity). Such data enables us to ask both what demographics our predictions align with and whose reasoning patterns our models' rationales align with. We find systematic inter-group annotator disagreement and show how 16 Transformer-based models align better with rationales provided by certain demographic groups: We find that models are biased towards aligning best with older and/or white annotators. We zoom in on the effects of model size and model distillation, finding -- contrary to our expectations -- negative correlations between model size and rationale agreement as well as no evidence that either model size or model distillation improves fairness.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. Samira Abnar and Willem Zuidema. 2020. Quantifying attention flow in transformers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4190–4197, Online. Association for Computational Linguistics.
  2. Xai for transformers: Better explanations through conservative propagation.
  3. Explaining recurrent neural network predictions in sentiment analysis. In Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pages 159–168, Copenhagen, Denmark. Association for Computational Linguistics.
  4. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLOS ONE, 10(7):1–46.
  5. Your fairness may vary: Pretrained language model fairness in toxic text classification.
  6. Challenges in applying explainability methods to improve the fairness of NLP models. In Proceedings of the 2nd Workshop on Trustworthy Natural Language Processing (TrustNLP 2022), pages 80–92, Seattle, U.S.A. Association for Computational Linguistics.
  7. Kate Barasz and Tami Kim. 2022. Choice perception: Making sense (and nonsense) of others’ decisions. Current opinion in psychology, 43:176–181.
  8. From responsibility to reason-giving explainable artificial intelligence. Philosophy and Technology, 35(1):1–30.
  9. Beata Beigman Klebanov and Eyal Beigman. 2009. Squibs: From annotator agreement to noise models. Computational Linguistics, 35(4):495–503.
  10. Claus Beisbart and Tim Räz. 2022. Philosophy of science at sea: Clarifying the interpretability of machine learning. Philosophy Compass, 17(6):e12830.
  11. Quantifying infra-marginality and its trade-off with group fairness. CoRR, abs/1909.00982.
  12. A clarification of the nuances in the fairness metrics landscape. Scientific Reports, 12.
  13. Fairness without demographics through knowledge distillation. In Advances in Neural Information Processing Systems.
  14. FairLex: A multilingual benchmark for evaluating fairness in legal text processing. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4389–4406, Dublin, Ireland. Association for Computational Linguistics.
  15. Cheng-Han Chiang and Hung-yi Lee. 2022. Re-examining human annotations for interpretable nlp.
  16. Sam Corbett-Davies and Sharad Goel. 2018. The measure and mismeasure of fairness: A critical review of fair machine learning. ArXiv, abs/1808.00023.
  17. Eraser: A benchmark to evaluate rationalized nlp models.
  18. Kawin Ethayarajh and Dan Jurafsky. 2020. Utility is in the eye of the user: A critique of NLP leaderboards. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4846–4853, Online. Association for Computational Linguistics.
  19. Shi Feng and Jordan Boyd-Graber. 2019. What can ai do for me? evaluating machine learning interpretations in cooperative play. In Proceedings of the 24th International Conference on Intelligent User Interfaces, IUI ’19, page 229–239, New York, NY, USA. Association for Computing Machinery.
  20. The (im)possibility of fairness: Different value systems require different mechanisms for fair decision making. Commun. ACM, 64(4):136–143.
  21. On the interaction of belief bias and explanations. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 2930–2942, Online. Association for Computational Linguistics.
  22. Knowledge distillation: A survey. International Journal of Computer Vision, 129(6):1789–1819.
  23. Philipp Hacker and Jan-Hendrik Passoth. 2022. Varieties of AI Explanations Under the Law. From the GDPR to the AIA, and Beyond, pages 343–373. Springer International Publishing, Cham.
  24. Challenges and strategies in cross-cultural NLP. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6997–7013, Dublin, Ireland. Association for Computational Linguistics.
  25. Zuco, a simultaneous eeg and eye-tracking resource for natural sentence reading. Scientific Data, 5.
  26. Dirk Hovy and Anders Søgaard. 2015. Tagging performance correlates with author age. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 483–488, Beijing, China. Association for Computational Linguistics.
  27. Alon Jacovi and Yoav Goldberg. 2020. Towards faithfully interpretable NLP systems: How should we define and evaluate faithfulness? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4198–4205, Online. Association for Computational Linguistics.
  28. Cross-lingual syntactic variation over age and gender. In Proceedings of the Nineteenth Conference on Computational Natural Language Learning, pages 103–112, Beijing, China. Association for Computational Linguistics.
  29. Learning a POS tagger for AAVE-like language. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1115–1120, San Diego, California. Association for Computational Linguistics.
  30. Are multilingual sentiment models equally right for the right reasons? In EMNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackBoxNLP).
  31. Wilds: A benchmark of in-the-wild distribution shifts.
  32. What do we want from explainable artificial intelligence (xai)? - A stakeholder perspective on XAI and a conceptual model guiding interdisciplinary XAI research. Artif. Intell., 296:103473.
  33. Brian Larson. 2017. Gender as a variable in natural-language processing: Ethical considerations. In Proceedings of the First ACL Workshop on Ethics in Natural Language Processing, pages 1–11, Valencia, Spain. Association for Computational Linguistics.
  34. Evaluating model performance under worst-case subpopulations. In Advances in Neural Information Processing Systems, volume 34, pages 17325–17334, Vancouver, CA. Curran Associates, Inc.
  35. Hatexplain: A benchmark dataset for explainable hate speech detection. In Proceedings of the AAAI Conference on Artificial Intelligence 35(17), pages 14867–14875.
  36. Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3428–3448, Florence, Italy. Association for Computational Linguistics.
  37. Andrés Páez. 2019. The pragmatic turn in explainable artificial intelligence (xai). Minds and Machines, 29(3):441–459.
  38. Stefan Palan and Christian Schitter. 2017. Prolific.ac—a subject pool for online experiments. Journal of Behavioral and Experimental Finance, 17:22–27.
  39. Steven T. Piantadosi and Felix Hill. 2022. Meaning without reference in large language models.
  40. Barbara Plank. 2022. The ’problem’ of human label variation: On ground truth in data, modeling and evaluation. ArXiv, abs/2211.02570.
  41. Learning part-of-speech taggers with inter-annotator agreement loss. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pages 742–751, Gothenburg, Sweden. Association for Computational Linguistics.
  42. DynaSent: A dynamic benchmark for sentiment analysis. arXiv preprint arXiv:2012.15349.
  43. Explain yourself! leveraging language models for commonsense reasoning. In Proceedings of the 2019 Conference of the Association for Computational Linguistics (ACL2019).
  44. John Rawls. 1971. A Theory of Justice, 1 edition. Belknap Press of Harvard University Press, Cambridge, Massachussets.
  45. Machine reading, fast and slow: When do models “understand” language? In Proceedings of the 29th International Conference on Computational Linguistics, pages 78–93, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
  46. Square one bias in NLP: Towards a multi-dimensional exploration of the research manifold. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2340–2354, Dublin, Ireland. Association for Computational Linguistics.
  47. The risk of racial bias in hate speech detection. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1668–1678, Florence, Italy. Association for Computational Linguistics.
  48. Glocalx - from local to global explanations of black box ai models. Artificial Intelligence, 294:103457.
  49. Inframarginality audit of group-fairness. Symposium on the Foundations of Responsible Computing (FORC).
  50. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing, pages 1631–1642.
  51. K. E. Stanovich and R. F. West. 2000. Individual differences in reasoning: Implications for the rationality debate? Behavioral and Brain Sciences, 23:645–665.
  52. CommonsenseQA: A question answering challenge targeting commonsense knowledge. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4149–4158, Minneapolis, Minnesota. Association for Computational Linguistics.
  53. Distill-and-compare: Auditing black-box models using transparent model distillation. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’18, page 303–310, New York, NY, USA. Association for Computing Machinery.
  54. Sahil Verma and Julia Rubin. 2018. Fairness definitions explained. In Proceedings of the International Workshop on Software Fairness, FairWare ’18, page 1–7, New York, NY, USA. Association for Computing Machinery.
  55. Investigating gender bias in language models using causal mediation analysis. In Advances in Neural Information Processing Systems, volume 33, pages 12388–12401, Vancouver, CA. Curran Associates, Inc.
  56. A fine-grained interpretability evaluation benchmark for neural nlp. ArXiv, abs/2205.11097.
  57. The exploited labor behind artificial intelligence.
  58. Robert Williamson and Aditya Menon. 2019. Fairness risk measures. In Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 6786–6797, Long Beach, California. PMLR.
  59. Guangxuan Xu and Qingyuan Hu. 2022. Can model compression improve nlp fairness.
  60. Using “annotator rationales” to improve machine learning for text categorization. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pages 260–267, Rochester, New York. Association for Computational Linguistics.
  61. Carlos Zednik and Hannes Boelsen. 2022. Scientific exploration and explainable artificial intelligence. Minds Mach., 32(1):219–239.
  62. Sociolectal analysis of pretrained language models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 4581–4588, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Terne Sasha Thorn Jakobsen (4 papers)
  2. Laura Cabello (9 papers)
  3. Anders Søgaard (121 papers)
Citations (8)

Summary

We haven't generated a summary for this paper yet.