Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Properties and Challenges of LLM-Generated Explanations (2402.10532v1)

Published 16 Feb 2024 in cs.CL, cs.AI, cs.HC, cs.LG, and cs.CY

Abstract: The self-rationalising capabilities of LLMs have been explored in restricted settings, using task/specific data sets. However, current LLMs do not (only) rely on specifically annotated data; nonetheless, they frequently explain their outputs. The properties of the generated explanations are influenced by the pre-training corpus and by the target data used for instruction fine-tuning. As the pre-training corpus includes a large amount of human-written explanations "in the wild", we hypothesise that LLMs adopt common properties of human explanations. By analysing the outputs for a multi-domain instruction fine-tuning data set, we find that generated explanations show selectivity and contain illustrative elements, but less frequently are subjective or misleading. We discuss reasons and consequences of the properties' presence or absence. In particular, we outline positive and negative implications depending on the goals and user groups of the self-rationalising system.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Amina Adadi and Mohammed Berrada. 2018. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access, 6:52138–52160.
  2. Explanations for CommonsenseQA: New Dataset and Models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3050–3065, Online. Association for Computational Linguistics.
  3. A general language assistant as a laboratory for alignment. CoRR, abs/2112.00861.
  4. Factuality challenges in the era of large language models.
  5. On the opportunities and risks of foundation models. CoRR, abs/2108.07258.
  6. Ruth M.J. Byrne. 2023. Good Explanations in Explainable Artificial Intelligence (XAI): Evidence from Human Explanatory Reasoning. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, pages 6536–6544. International Joint Conferences on Artificial Intelligence Organization. Survey Track.
  7. e-SNLI: Natural Language Inference with Natural Language Explanations. In Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc.
  8. Self-explanations: How students study and use examples in learning to solve problems. Cognitive science, 13(2):145–182.
  9. Maartje M. A. de Graaf and Bertram F. Malle. 2017. How People Explain Action (and Autonomous Intelligent Systems Should Too). In 2017 AAAI Fall Symposia, Arlington, Virginia, USA, November 9-11, 2017, pages 19–26. AAAI Press.
  10. Finale Doshi-Velez and Been Kim. 2017. A Roadmap for a Rigorous Science of Interpretability. CoRR, abs/1702.08608.
  11. Automated Rationale Generation: A Technique for Explainable AI and Its Effects on Human Perceptions. In Proceedings of the 24th International Conference on Intelligent User Interfaces, IUI ’19, page 263–274, New York, NY, USA. Association for Computing Machinery.
  12. Operationalizing Human-Centered Perspectives in Explainable AI. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, CHI EA ’21, New York, NY, USA. Association for Computing Machinery.
  13. Do explanations help users detect errors in open-domain QA? an evaluation of spoken vs. visual explanations. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 1103–1116, Online. Association for Computational Linguistics.
  14. Measuring individual differences in implicit cognition: the implicit association test. Journal of personality and social psychology, 74(6):1464.
  15. Bernease Herman. 2017. The promise and peril of human evaluation for model interpretability. CoRR, abs/1711.07414. Withdrawn.
  16. Denis J Hilton. 1990. Conversational processes and causal explanation. Psychological Bulletin, 107(1):65.
  17. Alon Jacovi and Yoav Goldberg. 2020. Towards faithfully interpretable NLP systems: How should we define and evaluate faithfulness? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4198–4205, Online. Association for Computational Linguistics.
  18. Alon Jacovi and Yoav Goldberg. 2021. Aligning faithful interpretations with their social attribution. Transactions of the Association for Computational Linguistics, 9:294–310.
  19. Survey of hallucination in natural language generation. ACM Comput. Surv., 55(12).
  20. What lies beneath? Understanding the limits of understanding. Thinking and Seeing: Visual Metacognition in Adults and Children.
  21. Frank C Keil. 2006. Explanation and understanding. Annual review of psychology, 57:227.
  22. Large Language Models are Zero-Shot Reasoners.
  23. Hallucinations in neural machine translation. In Interpretability and Robustness in Audio, Speech, and Language Workshop. Conference on Neural Information Processing Systems (NeurIPS 2018), Montreal, Canada.
  24. Piyawat Lertvittayakumjorn and Francesca Toni. 2021. Explanation-Based Human Debugging of NLP Models: A Survey. Transactions of the Association for Computational Linguistics, 9:1508–1528.
  25. TruthfulQA: Measuring how models mimic human falsehoods. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3214–3252, Dublin, Ireland. Association for Computational Linguistics.
  26. Towards explainable NLP: A generative explanation framework for text classification. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5570–5581, Florence, Italy. Association for Computational Linguistics.
  27. Tania Lombrozo. 2006. The structure and function of explanations. Trends in cognitive sciences, 10(10):464–470.
  28. Few-shot self-rationalization with natural language prompts. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 410–424, Seattle, United States. Association for Computational Linguistics.
  29. Brian W Matthews. 1975. Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure, 405(2):442–451.
  30. On faithfulness and factuality in abstractive summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1906–1919, Online. Association for Computational Linguistics.
  31. Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve. arXiv preprint arXiv:2309.13638.
  32. Tim Miller. 2019. Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence, 267:1–38.
  33. Explaining Explanations in AI. In Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* ’19, page 279–288, New York, NY, USA. Association for Computing Machinery.
  34. WT5?! Training Text-to-Text Models to Explain their Predictions.
  35. How do Humans Understand Explanations from Machine Learning Systems? An Evaluation of the Human-Interpretability of Explanation. CoRR, abs/1802.00682.
  36. OpenAI. 2023. Gpt-4 technical report. ArXiv, abs/2303.08774.
  37. Training language models to follow instructions with human feedback. CoRR, abs/2203.02155.
  38. Multimodal Explanations: Justifying Decisions and Pointing to the Evidence. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8779–8788.
  39. Instruction Tuning with GPT-4. arXiv preprint arXiv:2304.03277.
  40. Improving language understanding with unsupervised learning.
  41. Language Models are Unsupervised Multitask Learners.
  42. Explain yourself! leveraging language models for commonsense reasoning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4932–4942, Florence, Italy. Association for Computational Linguistics.
  43. Leonid Rozenblit and Frank Keil. 2002. The misunderstood limits of folk science: An illusion of explanatory depth. Cognitive science, 26(5):521–562.
  44. Learning by Self-Explaining. arXiv preprint arXiv:2309.08395.
  45. Chenhao Tan. 2022. On the diversity and limits of human explanations. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2173–2188, Seattle, United States. Association for Computational Linguistics.
  46. Stanford Alpaca: An Instruction-following LLaMA model. https://github.com/tatsu-lab/stanford_alpaca.
  47. Label Studio: Data labeling software. Open source software available from https://github.com/heartexlabs/label-studio.
  48. Language models don’t always say what they think: Unfaithful explanations in chain-of-thought prompting. In Thirty-seventh Conference on Neural Information Processing Systems, New Orleans, Louisiana, USA. Conference on Neural Information Processing Systems (NeurIPS).
  49. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  50. Self-Instruct: Aligning Language Model with Self Generated Instructions.
  51. Finetuned Language Models are Zero-Shot Learners. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022.
  52. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682.
  53. Peter A White. 1995. Use of prior beliefs in the assignment of causal roles: Causal powers versus regularity-based accounts. Memory & Cognition, 23:243–254.
  54. Reframing human-AI collaboration for generating free-text explanations. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 632–658, Seattle, United States. Association for Computational Linguistics.
  55. Sarah Wiegreffe and Ana Marasovic. 2021. Teach Me to Explain: A Review of Datasets for Explainable Natural Language Processing. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, volume 1.
  56. Measuring association between labels and free-text rationales. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 10266–10284, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  57. Robert A. Wilson and Frank Keil. 1998. The Shadows and Shallows of Explanation. Minds and machines, 8(1):137–159.
  58. Tree of thoughts: Deliberate problem solving with large language models. CoRR, abs/2305.10601.
  59. The unreliability of explanations in few-shot prompting for textual reasoning. In Advances in Neural Information Processing Systems.
  60. Xinyan Zhao and V. G. Vinod Vydiswaran. 2020. LIREx: Augmenting Language Inference with Relevant Explanation. CoRR, abs/2012.09157.
Citations (9)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets