Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Post Hoc Explanations of Language Models Can Improve Language Models (2305.11426v3)

Published 19 May 2023 in cs.CL and cs.AI

Abstract: LLMs have demonstrated remarkable capabilities in performing complex tasks. Moreover, recent research has shown that incorporating human-annotated rationales (e.g., Chain-of-Thought prompting) during in-context learning can significantly enhance the performance of these models, particularly on tasks that require reasoning capabilities. However, incorporating such rationales poses challenges in terms of scalability as this requires a high degree of human involvement. In this work, we present a novel framework, Amplifying Model Performance by Leveraging In-Context Learning with Post Hoc Explanations (AMPLIFY), which addresses the aforementioned challenges by automating the process of rationale generation. To this end, we leverage post hoc explanation methods which output attribution scores (explanations) capturing the influence of each of the input features on model predictions. More specifically, we construct automated natural language rationales that embed insights from post hoc explanations to provide corrective signals to LLMs. Extensive experimentation with real-world datasets demonstrates that our framework, AMPLIFY, leads to prediction accuracy improvements of about 10-25% over a wide range of tasks, including those where prior approaches which rely on human-annotated rationales such as Chain-of-Thought prompting fall short. Our work makes one of the first attempts at highlighting the potential of post hoc explanations as valuable tools for enhancing the effectiveness of LLMs. Furthermore, we conduct additional empirical analyses and ablation studies to demonstrate the impact of each of the components of AMPLIFY, which, in turn, leads to critical insights for refining in-context learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Gpt-3.5-turbo. https://platform.openai.com/docs/model-index-for-researchers, a. Accessed: 2022-01-01.
  2. Gpt-4 system card. https://cdn.openai.com/papers/gpt-4-system-card.pdf, b. Accessed: 2022-01-01.
  3. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
  4. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  5. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
  6. Interpretation of black box nlp models: A survey. arXiv preprint arXiv:2203.17081, 2022.
  7. BERT: pre-training of deep bidirectional transformers for language understanding. In Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2018.
  8. A survey for in-context learning. arXiv preprint arXiv:2301.00234, 2022.
  9. F. Doshi-Velez and B. Kim. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608, 2017.
  10. Towards benchmarking the utility of explanations for model debugging. arXiv preprint arXiv:2105.04505, 2021.
  11. How can i choose an explainer? an application-grounded evaluation of post-hoc explanations. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pages 805–815, 2021.
  12. Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38, 2023.
  13. The disagreement problem in explainable machine learning: A practitioner’s perspective. arXiv preprint arXiv:2202.01602, 2022.
  14. Robust and stable black box explanations. In International Conference on Machine Learning, pages 5628–5638. PMLR, 2020.
  15. Can language models learn from explanations in context? arXiv preprint arXiv:2204.02329, 2022.
  16. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35, 2023.
  17. Estimating the carbon footprint of bloom, a 176b parameter language model. arXiv preprint arXiv:2211.02001, 2022.
  18. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, 2017.
  19. Post-hoc interpretability for neural nlp: A survey. ACM Computing Surveys, 55(8):1–42, 2022.
  20. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  21. Language models are unsupervised multitask learners. 2019a.
  22. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019b.
  23. “Why should I trust you?" Explaining the predictions of any classifier. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016.
  24. Learning important features through propagating activation differences. In Proceedings of the 34th International Conference on Machine Learning, 2017.
  25. Deep inside convolutional networks: Visualising image classification models and saliency maps. In International Conference on Learning Representations, 2014.
  26. Reliable post hoc explanations: Modeling uncertainty in explainability. Advances in neural information processing systems, 34:9391–9404, 2021.
  27. Talktomodel: Understanding machine learning models with open ended dialogues. arXiv preprint arXiv:2207.04154, 2022.
  28. Smoothgrad: Removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017.
  29. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615, 2022.
  30. Axiomatic attribution for deep networks. In International Conference on Machine Learning, 2017.
  31. Challenging big-bench tasks and whether chain-of-thought can solve them. arXiv preprint arXiv:2210.09261, 2022.
  32. CommonsenseQA: A question answering challenge targeting commonsense knowledge. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4149–4158, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1421. URL https://aclanthology.org/N19-1421.
  33. Language models don’t always say what they think: Unfaithful explanations in chain-of-thought prompting. arXiv preprint arXiv:2305.04388, 2023.
  34. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682, 2022a.
  35. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903, 2022b.
  36. K. Yin and G. Neubig. Interpreting language models with contrastive explanations. arXiv preprint arXiv:2202.10419, 2022.
  37. M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13, pages 818–833. Springer, 2014.
  38. Agieval: A human-centric benchmark for evaluating foundation models. arXiv preprint arXiv:2304.06364, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Satyapriya Krishna (27 papers)
  2. Jiaqi Ma (82 papers)
  3. Dylan Slack (17 papers)
  4. Asma Ghandeharioun (19 papers)
  5. Sameer Singh (96 papers)
  6. Himabindu Lakkaraju (88 papers)
Citations (40)