Uncertainty-aware Language Modeling for Selective Question Answering (2311.15451v1)

Published 26 Nov 2023 in cs.CL and cs.LG

Abstract: We present an automatic LLM conversion approach that produces uncertainty-aware LLMs capable of estimating uncertainty with every prediction. Our approach is model- and data-agnostic, is computationally-efficient, and does not rely on external models or systems. We evaluate converted models on the selective question answering setting -- to answer as many questions as possible while maintaining a given accuracy, forgoing providing predictions when necessary. As part of our results, we test BERT and Llama 2 model variants on the SQuAD extractive QA task and the TruthfulQA generative QA task. We show that using the uncertainty estimates provided by our approach to selectively answer questions leads to significantly higher accuracy over directly using model probabilities.

PDF HTML Abstract

The paper introduces an innovative framework designed to enhance the performance of LLMs in the field of question answering (QA) by incorporating a measure of uncertainty with every prediction. Developed by Themis AI Inc, this framework is agnostic to model type and data, implying that it can be applied to a variety of models and datasets without being constrained by their specific architectures or the nature of the data.

Question answering is a critical task for many LLM applications, where the goal is not just to generate any answer, but to provide accurate and reliable responses. Traditional LLMs can struggle with this, often failing to gauge their confidence levels appropriately, which can lead to incorrect or misleading answers. The paper outlines that many factors contribute to these failures, including out-of-domain data, prompt ambiguities, inconsistent training information, and hallucinations (incorrectly synthesized information).

The researchers present a technique that improves the capability of LLMs in selective QA tasks, which require the model to maintain a high level of accuracy while answering as many questions as possible. Rather than attempting to respond to every query, an LLM with selective prediction can abstain from answering when its confidence is low, thus improving the overall output reliability.

The key to this approach is converting existing LLMs into uncertainty-aware variants, which can detect different types of uncertainties associated with their predictions. The paper explores two main kinds of uncertainty: aleatoric and epistemic. Aleatoric uncertainty is associated with inherent noise within the dataset, while epistemic uncertainty links to the model's knowledge limitations—essentially what the model does not know.

In an empirical paper featuring both extractive and generative QA models, the framework demonstrated that it could enhance accuracy across a range of confidence levels. Specifically, it showed that conventional measures like softmax probabilities are not reliable confidence indicators compared to uncertainty measures; where high softmax probabilities often correlated with lower accuracies, the uncertainty-aware models achieved more consistent results.

Moreover, the paper reports on an algorithmic method that automatically converts LLMs to calculate these uncertainty metrics efficiently, without adding significant computational overhead or requiring additional models or systems. This is particularly valuable for developers who seek to enhance existing models without the need for extensive restructuring or additional resources.

In conclusion, the paper represents a notable stride in improving the reliability and efficacy of LLMs in QA tasks. By addressing the critical issue of model confidence and introducing an easily integrable solution that quantifies uncertainty, the researchers provide a pathway towards models that can better discern when to answer a question and when to pass, leading to more trustworthy AI-based systems.

PDF Markdown Bookmark Chat (Pro)

References (41)

Authors (12)

Qi Yang (111 papers)
Shreya Ravikumar (1 paper)
Fynn Schmitt-Ulms (2 papers)
Satvik Lolla (4 papers)
Ege Demir (1 paper)
Iaroslav Elistratov (2 papers)
Alex Lavaee (2 papers)
Sadhana Lolla (4 papers)
Elaheh Ahmadi (5 papers)
Daniela Rus (181 papers)
Alexander Amini (32 papers)
Alejandro Perez (53 papers)

Citations (6)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/1623431297698865158/status/1732141414555799709

YouTube

Show All Videos