Emergent Mind

Post Hoc Explanations of Language Models Can Improve Language Models

Published May 19, 2023 in cs.CL and cs.AI


Large Language Models (LLMs) have demonstrated remarkable capabilities in performing complex tasks. Moreover, recent research has shown that incorporating human-annotated rationales (e.g., Chain-of-Thought prompting) during in-context learning can significantly enhance the performance of these models, particularly on tasks that require reasoning capabilities. However, incorporating such rationales poses challenges in terms of scalability as this requires a high degree of human involvement. In this work, we present a novel framework, Amplifying Model Performance by Leveraging In-Context Learning with Post Hoc Explanations (AMPLIFY), which addresses the aforementioned challenges by automating the process of rationale generation. To this end, we leverage post hoc explanation methods which output attribution scores (explanations) capturing the influence of each of the input features on model predictions. More specifically, we construct automated natural language rationales that embed insights from post hoc explanations to provide corrective signals to LLMs. Extensive experimentation with real-world datasets demonstrates that our framework, AMPLIFY, leads to prediction accuracy improvements of about 10-25% over a wide range of tasks, including those where prior approaches which rely on human-annotated rationales such as Chain-of-Thought prompting fall short. Our work makes one of the first attempts at highlighting the potential of post hoc explanations as valuable tools for enhancing the effectiveness of LLMs. Furthermore, we conduct additional empirical analyses and ablation studies to demonstrate the impact of each of the components of AMPLIFY, which, in turn, leads to critical insights for refining in-context learning.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a detailed summary of this paper with a premium account.

We ran into a problem analyzing this paper.

Get summaries of Trending computer science papers delivered straight to your inbox

Unsubscribe anytime.

  1. Gpt-3.5-turbo. https://platform.openai.com/docs/model-index-for-researchers, a. Accessed: 2022-01-01.

  2. GPT-4 system card. https://cdn.openai.com/papers/gpt-4-system-card.pdf, b. Accessed: 2022-01-01.

  3. On the Opportunities and Risks of Foundation Models
  4. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901
  5. Sparks of Artificial General Intelligence: Early experiments with GPT-4
  6. Interpretation of Black Box NLP Models: A Survey
  7. BERT: pre-training of deep bidirectional transformers for language understanding. In Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)
  8. A Survey on In-context Learning
  9. Towards A Rigorous Science of Interpretable Machine Learning
  10. Towards Benchmarking the Utility of Explanations for Model Debugging
  11. How can i choose an explainer? an application-grounded evaluation of post-hoc explanations. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pages 805–815
  12. Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38
  13. The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective
  14. Robust and stable black box explanations. In International Conference on Machine Learning, pages 5628–5638. PMLR
  15. Can language models learn from explanations in context?
  16. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35
  17. Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model
  18. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems
  19. Post-hoc interpretability for neural nlp: A survey. ACM Computing Surveys, 55(8):1–42
  20. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744
  21. Language models are unsupervised multitask learners. 2019a.
  22. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019b.
  23. “Why should I trust you?" Explaining the predictions of any classifier. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
  24. Learning important features through propagating activation differences. In Proceedings of the 34th International Conference on Machine Learning
  25. Deep inside convolutional networks: Visualising image classification models and saliency maps. In International Conference on Learning Representations
  26. Reliable post hoc explanations: Modeling uncertainty in explainability. Advances in neural information processing systems, 34:9391–9404
  27. TalkToModel: Explaining Machine Learning Models with Interactive Natural Language Conversations
  28. SmoothGrad: removing noise by adding noise
  29. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
  30. Axiomatic attribution for deep networks. In International Conference on Machine Learning
  31. Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
  32. CommonsenseQA: A question answering challenge targeting commonsense knowledge. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4149–4158, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1421. https://aclanthology.org/N19-1421.

  33. Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
  34. Emergent Abilities of Large Language Models
  35. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
  36. Interpreting Language Models with Contrastive Explanations
  37. M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13, pages 818–833. Springer
  38. AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models

Show All 38