How Robust are LLMs to In-Context Majority Label Bias? (2312.16549v1)
Abstract: In the In-Context Learning (ICL) setup, various forms of label biases can manifest. One such manifestation is majority label bias, which arises when the distribution of labeled examples in the in-context samples is skewed towards one or more specific classes making LLMs more prone to predict those labels. Such discrepancies can arise from various factors, including logistical constraints, inherent biases in data collection methods, limited access to diverse data sources, etc. which are unavoidable in a real-world industry setup. In this work, we study the robustness of in-context learning in LLMs to shifts that occur due to majority label bias within the purview of text classification tasks. Prior works have shown that in-context learning with LLMs is susceptible to such biases. In our study, we go one level deeper and show that the robustness boundary varies widely for different models and tasks, with certain LLMs being highly robust (~90%) to majority label bias. Additionally, our findings also highlight the impact of model size and the richness of instructional prompts contributing towards model robustness. We restrict our study to only publicly available open-source models to ensure transparency and reproducibility.
- Falcon-40B: an open large language model with state-of-the-art performance.
- Language models are few-shot learners. Advances in neural information processing systems, 33: 1877–1901.
- BoolQ: Exploring the surprising difficulty of natural yes/no questions. arXiv preprint arXiv:1905.10044.
- Mitigating Label Biases for In-context Learning. arXiv:2305.19148.
- OpenLLaMA: An Open Reproduction of LLaMA.
- The third pascal recognizing textual entailment challenge. In Proceedings of the ACL-PASCAL workshop on textual entailment and paraphrasing, 1–9.
- OpenAssistant Conversations–Democratizing Large Language Model Alignment. arXiv preprint arXiv:2304.07327.
- How Good Are Large Language Models at Out-of-Distribution Detection? arXiv preprint arXiv:2308.10261.
- What Makes Good In-Context Examples for GPT-3333? arXiv:2101.06804.
- Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity. arXiv:2104.08786.
- PEFT: State-of-the-art Parameter-Efficient Fine-Tuning methods. https://github.com/huggingface/peft.
- The role of the crowd in countering misinformation: A case study of the COVID-19 infodemic. In 2020 IEEE International Conference on Big Data (Big Data), 748–757. IEEE.
- MosaicML NLP Team. 2023. Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs. Accessed: 2023-09-28.
- A Comprehensive Overview of Large Language Models. arXiv preprint arXiv:2307.06435.
- Text Classification via Large Language Models. arXiv preprint arXiv:2305.08377.
- Investigating the Learning Behaviour of In-context Learning: A Comparison with Supervised Learning. arXiv:2307.15411.
- Larger language models do in-context learning differently. arXiv preprint arXiv:2303.03846.
- Efficient Guided Generation for LLMs. arXiv preprint arXiv:2307.09702.
- Calibrate Before Use: Improving Few-Shot Performance of Language Models. arXiv:2102.09690.
- Karan Gupta (7 papers)
- Sumegh Roychowdhury (10 papers)
- Siva Rajesh Kasa (11 papers)
- Santhosh Kumar Kasa (3 papers)
- Anish Bhanushali (3 papers)
- Nikhil Pattisapu (2 papers)
- Prasanna Srinivasa Murthy (3 papers)