Do LLMs Know When to NOT Answer? Investigating Abstention Abilities of Large Language Models (2407.16221v2)
Abstract: Abstention Ability (AA) is a critical aspect of LLM reliability, referring to an LLM's capability to withhold responses when uncertain or lacking a definitive answer, without compromising performance. Although previous studies have attempted to improve AA, they lack a standardised evaluation method and remain unsuitable for black-box models where token prediction probabilities are inaccessible. This makes comparative analysis challenging, especially for state-of-the-art closed-source commercial LLMs. This paper bridges this gap by introducing a black-box evaluation approach and a new dataset, Abstain-QA, crafted to rigorously assess AA across varied question types (answerable and unanswerable), domains (well-represented and under-represented), and task types (fact centric and reasoning). We also propose a new confusion matrix, the ''Answerable-Unanswerable Confusion Matrix'' (AUCM) which serves as the basis for evaluating AA, by offering a structured and precise approach for assessment. Finally, we explore the impact of three prompting strategies-Strict Prompting, Verbal Confidence Thresholding, and Chain-of-Thought (CoT)-on improving AA. Our results indicate that even powerful models like GPT-4, Mixtral 8x22b encounter difficulties with abstention; however, strategic approaches such as Strict prompting and CoT can enhance this capability.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023).
- From sparse to dense: GPT-4 summarization with chain of density prompting. arXiv preprint arXiv:2309.04269 (2023).
- Amos Azaria and Tom Mitchell. 2023. The internal state of an llm knows when its lying. arXiv preprint arXiv:2304.13734 (2023).
- Oleksandr Balabanov and Hampus Linander. 2024. Uncertainty quantification in fine-tuned LLMs using LoRA ensembles. arXiv preprint arXiv:2402.12264 (2024).
- Adaptation with self-evaluation to improve selective prediction in llms. arXiv preprint arXiv:2310.11689 (2023).
- Shifting attention to relevance: Towards the uncertainty estimation of large language models. arXiv preprint arXiv:2307.01379 (2023).
- Conformal Alignment: Knowing When to Trust Foundation Models with Guarantees. arXiv preprint arXiv:2405.10301 (2024).
- Phrase-based rāga recognition using vector space modeling. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 66–70.
- Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300 (2020).
- Mixtral of experts. arXiv preprint arXiv:2401.04088 (2024).
- TM Krishna and Vignesh Ishwar. 2012. Carnatic music: Svara, gamaka, motif and raga identity. In Serra X, Rao P, Murthy H, Bozkurt B, editors. Proceedings of the 2nd CompMusic Workshop; 2012 Jul 12-13; Istanbul, Turkey. Barcelona: Universitat Pompeu Fabra; 2012. Universitat Pompeu Fabra.
- Large language models understand and can be enhanced by emotional stimuli. arXiv preprint arXiv:2307.11760 (2023).
- Large language models in finance: A survey. In Proceedings of the Fourth ACM International Conference on AI in Finance. 374–382.
- Truthfulqa: Measuring how models mimic human falsehoods. arXiv preprint arXiv:2109.07958 (2021).
- Sathwik Tejaswi Madhusudhan and Girish Chowdhary. 2019. Deepsrgm-sequence classification and ranking in Indian classical music with deep learning. In Proceedings of the 20th International Society for Music Information Retrieval Conference. 533–540.
- When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 9802–9822.
- OpenAI. 2023. Models-OpenAI API. https://platform.openai.com/docs/models/gpt-3-5-turbo
- OpenAI. 2024. Models-OpenAI API. https://platform.openai.com/docs/models/gpt-4-turbo-and-gpt-4
- Raga and tonic identification in carnatic music. Journal of New Music Research 46, 3 (2017), 229–245.
- Rajeswari Sridhar and TV Geetha. 2009. Raga identification of carnatic music for music information retrieval. International Journal of recent trends in Engineering 1, 1 (2009), 571.
- Mistral AI team. 2024. Cheaper, Better, Stronger, Faster — Mistral AI. https://mistral.ai/news/mixtral-8x22b/
- Uncertainty-Based Abstention in LLMs Improves Safety and Reduces Hallucinations. arXiv preprint arXiv:2404.10960 (2024).
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).
- Neeraj Varshney and Chitta Baral. 2023. Post-abstention: Towards reliably re-attempting the abstained instances in QA. arXiv preprint arXiv:2305.01812 (2023).
- KG Vijayakrishnan. 2007. The grammar of Carnatic music. Mouton de Gruyter.
- Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171 (2022).
- Nishanth Madhusudhan (2 papers)
- Sathwik Tejaswi Madhusudhan (10 papers)
- Vikas Yadav (38 papers)
- Masoud Hashemi (12 papers)