In-Context Explainers: Harnessing LLMs for Explaining Black Box Models (2310.05797v4)
Abstract: Recent advancements in LLMs have demonstrated exceptional capabilities in complex tasks like machine translation, commonsense reasoning, and language understanding. One of the primary reasons for the adaptability of LLMs in such diverse tasks is their in-context learning (ICL) capability, which allows them to perform well on new tasks by simply using a few task samples in the prompt. Despite their effectiveness in enhancing the performance of LLMs on diverse language and tabular tasks, these methods have not been thoroughly explored for their potential to generate post hoc explanations. In this work, we carry out one of the first explorations to analyze the effectiveness of LLMs in explaining other complex predictive models using ICL. To this end, we propose a novel framework, In-Context Explainers, comprising of three novel approaches that exploit the ICL capabilities of LLMs to explain the predictions made by other predictive models. We conduct extensive analysis with these approaches on real-world tabular and text datasets and demonstrate that LLMs are capable of explaining other predictive models similar to state-of-the-art post hoc explainers, opening up promising avenues for future research into LLM-based post hoc explanations of complex predictive models.
- Openxai: Towards a transparent evaluation of model explanations. NeurIPS, 2022.
- Publicly available clinical bert embeddings. arXiv, 2019.
- Anthropic. Anthropic \ claude 2. https://www.anthropic.com/index/claude-2. (Accessed on 07/17/2023).
- Sam: The sensitivity of attribution methods to hyperparameters. In CVPR, 2020.
- Language models can explain neurons in language models. https://openaipublic.blob.core.windows.net/neuron-explainer/paper/index.html, 2023.
- Language models are few-shot learners. NeurIPS, 2020.
- Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv, 2023.
- Towards a rigorous science of interpretable machine learning. arXiv, 2017.
- Google. Try bard, an ai experiment by google. https://bard.google.com/. (Accessed on 07/17/2023).
- Tabllm: Few-shot classification of tabular data with large language models. In AISTATS. PMLR, 2023.
- How good are gpt models at machine translation? a comprehensive evaluation. arXiv, 2023.
- Kaggle. Adult income dataset. https://www.kaggle.com/wenruliu/adult-income-dataset. Accessed: 2020-01-01.
- The disagreement problem in explainable machine learning: A practitioner’s perspective. arXiv, 2022.
- Post hoc explanations of language models can improve language models. arXiv, 2023.
- Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 2020.
- Lost in the middle: How language models use long contexts. arXiv, 2023a.
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 2023b.
- A unified approach to interpreting model predictions. In NeurIPS, 2017.
- Meta. Llama 2 - meta ai. https://ai.meta.com/llama/. (Accessed on 09/15/2023).
- OpenAI. Introducing chatgpt. https://openai.com/blog/chatgpt. (Accessed on 07/17/2023).
- OpenAI. Gpt-4 technical report, 2023.
- Pytorch: An imperative style, high-performance deep learning library. In NeurIPS, 2019.
- ProPublica. How we analyzed the compas recidivism algorithm. https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm. Accessed: 2021-01-20.
- Learning to generate reviews and discovering sentiment, 2017.
- Anchors: High-precision model-agnostic explanations. In AAAI, 2018.
- “Why should I trust you?” Explaining the predictions of any classifier. In KDD, 2016.
- Learning important features through propagating activation differences. In ICML, 2017.
- Learning to explain: Generating stable explanations fast. In IJNLP, 2021.
- Smoothgrad: Removing noise by adding noise. arXiv, 2017.
- Axiomatic attribution for deep networks. In ICML, 2017.
- UCI. Default of credit card clients data set. https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients. Accessed: 2020-01-01.
- Attention is all you need. NeurIPS, 2017.
- Emergent abilities of large language models. arXiv, 2022a.
- Chain of thought prompting elicits reasoning in large language models. arXiv, 2022b.
- On the (in) fidelity and sensitivity of explanations. NeurIPS, 2019.
- Knowledge discovery on rfm model using bernoulli sequence. Expert Systems with applications, 2009.
- Visualizing and understanding convolutional networks. In ECCV, 2014.
- Nicholas Kroeger (2 papers)
- Dan Ley (11 papers)
- Satyapriya Krishna (27 papers)
- Chirag Agarwal (39 papers)
- Himabindu Lakkaraju (88 papers)