Towards Understanding In-Context Learning with Contrastive Demonstrations and Saliency Maps (2307.05052v4)
Abstract: We investigate the role of various demonstration components in the in-context learning (ICL) performance of LLMs. Specifically, we explore the impacts of ground-truth labels, input distribution, and complementary explanations, particularly when these are altered or perturbed. We build on previous work, which offers mixed findings on how these elements influence ICL. To probe these questions, we employ explainable NLP (XNLP) methods and utilize saliency maps of contrastive demonstrations for both qualitative and quantitative analysis. Our findings reveal that flipping ground-truth labels significantly affects the saliency, though it's more noticeable in larger LLMs. Our analysis of the input distribution at a granular level reveals that changing sentiment-indicative terms in a sentiment analysis task to neutral ones does not have as substantial an impact as altering ground-truth labels. Finally, we find that the effectiveness of complementary explanations in boosting ICL performance is task-dependent, with limited benefits seen in sentiment analysis tasks compared to symbolic reasoning tasks. These insights are critical for understanding the functionality of LLMs and guiding the development of effective demonstrations, which is increasingly relevant in light of the growing use of LLMs in applications such as ChatGPT. Our research code is publicly available at https://github.com/paihengxu/XICL.
- What learning algorithm is in-context learning? investigations with linear models. arXiv preprint arXiv:2211.15661 (2022).
- J Alammar. 2021. Ecco: An open source library for the explainability of transformer language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations. 249–257.
- Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 (2022).
- Aligning Large Multi-Modal Model with Robust Instruction Tuning. arXiv preprint arXiv:2306.14565 (2023).
- DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents. arXiv preprint arXiv:2306.06306 (2023).
- Visual news: Benchmark and challenges in news image captioning. arXiv preprint arXiv:2010.03743 (2020).
- Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? arXiv preprint arXiv:2202.12837 (2022).
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730–27744.
- ” Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135–1144.
- Deep inside convolutional networks: visualising image classification models and saliency maps. In Proceedings of the International Conference on Learning Representations (ICLR). ICLR.
- Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017).
- Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Seattle, Washington, USA, 1631–1642. https://aclanthology.org/D13-1170
- Axiomatic Attribution for Deep Networks. In Proceedings of the 34th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 70), Doina Precup and Yee Whye Teh (Eds.). PMLR, 3319–3328. https://proceedings.mlr.press/v70/sundararajan17a.html
- Larger language models do in-context learning differently. arXiv preprint arXiv:2303.03846 (2023).
- An explanation of in-context learning as implicit bayesian inference. arXiv preprint arXiv:2111.02080 (2021).
- Complementary Explanations for Effective In-Context Learning. arXiv preprint arXiv:2211.13892 (2022).
- A Survey on Multimodal Large Language Models. arXiv preprint arXiv:2306.13549 (2023).
- A survey of large language models. arXiv preprint arXiv:2303.18223 (2023).
- Calibrate before use: Improving few-shot performance of language models. In International Conference on Machine Learning. PMLR, 12697–12706.
- Paiheng Xu (14 papers)
- Fuxiao Liu (17 papers)
- Zongxia Li (14 papers)
- Hyemi Song (2 papers)
- Yue Feng (55 papers)