Finetuning Language Models to Emit Linguistic Expressions of Uncertainty (2409.12180v1)

Published 18 Sep 2024 in cs.CL and cs.LG

Abstract: LLMs are increasingly employed in information-seeking and decision-making tasks. Despite their broad utility, LLMs tend to generate information that conflicts with real-world facts, and their persuasive style can make these inaccuracies appear confident and convincing. As a result, end-users struggle to consistently align the confidence expressed by LLMs with the accuracy of their predictions, often leading to either blind trust in all outputs or a complete disregard for their reliability. In this work, we explore supervised finetuning on uncertainty-augmented predictions as a method to develop models that produce linguistic expressions of uncertainty. Specifically, we measure the calibration of pre-trained models and then fine-tune LLMs to generate calibrated linguistic expressions of uncertainty. Through experiments on various question-answering datasets, we demonstrate that LLMs are well-calibrated in assessing their predictions, and supervised finetuning based on the model's own confidence leads to well-calibrated expressions of uncertainty, particularly for single-claim answers.

PDF Abstract

Finetuning LLMs to Emit Linguistic Expressions of Uncertainty

The paper presents a novel method to address the challenge of aligning the expressed confidence of LLMs with the actual accuracy of their predictions by employing supervised fine-tuning on uncertainty-augmented data. This approach is particularly aimed at improving the calibration of LLMs, enabling them to generate linguistic expressions of uncertainty alongside their responses to information-seeking queries. The main objectives are to augment the reliability of LLMs and enhance user trust by providing transparent confidence measures.

Background and Motivation

Despite their widespread applicability, LLMs frequently produce outputs with misleading confidence, especially when responding to out-of-distribution queries. The absence of uncertainty expressions in their outputs can lead to over-reliance or complete disregard by end-users, particularly in sensitive domains such as healthcare or law. Addressing this, the paper focuses on enriching LLMs with the ability to express uncertainty in a calibrated manner, bridging the gap between model confidence and prediction correctness.

Methodology

The authors propose a methodical approach involving several stages:

Self-Evaluation for Calibration: The paper uses a true/false self-evaluation task to assess the pre-trained model's confidence in its outputs. Through isotonic regression applied to these self-evaluation scores, nearly perfect calibration is achieved for models across different sizes.
Mapping Confidence to Linguistic Terms: Using a predefined mapping from numerical confidence scores to linguistic terms like "likely" or "highly unlikely," derived from human perception studies, models are trained to express uncertainty in a linguistically intuitive manner.
Fine-Tuning on Augmented Data: The fine-tuning process incorporates uncertainty expressions into the datasets, explored through three augmentation techniques: prefixed, postfixed, and interleaved uncertainty expressions. Among these, postfixed expressions exhibited the best calibration results due to less interference with the generative process.

Experimental Results

The paper conducts extensive experiments, demonstrating that the proposed approach significantly enhances the calibration of LLMs. For models across sizes and training stages, using datasets like TriviaQA and AmbigQA, postfixed uncertainty expressions yielded the lowest Expected Calibration Error (ECE) and Brier Score, indicating improved alignment between expressed confidence and accuracy.

A noteworthy observation from the experiments is that larger models show inherently better calibration, and finely calibrated models were capable of producing well-correlated linguistic uncertainty expressions. However, models designed to be aligned with specific instructions showed comparatively poorer calibration.

Implications and Future Directions

The introduction of well-calibrated linguistic uncertainty alongside predictions paves the way for more reliable and transparent AI systems. This advancement is particularly promising for enhancing user trust in AI, especially in domains where the consequences of incorrect or misleading information can be severe.

Practically, this method provides a middle ground between providing complete answers and abstaining from responses when uncertainty is high. It empowers users with an informed basis for decision-making, potentially consulting additional sources when the model expresses significant uncertainty.

From a theoretical standpoint, these findings underline the importance of calibrating not just the outputs but also the model’s confidence expression mechanisms. The paper suggests incorporating this fine-tuned uncertainty expression framework as an intermediary phase in LLM development, possibly between supervised fine-tuning and RLHF.

Future works could expand upon this by exploring more nuanced expressions of uncertainty, integrating aleatoric uncertainty, and possibly further refining the self-evaluation methodology to enhance model introspection abilities. Additionally, extending this approach to multi-modal models could yield interesting insights into cross-domain calibration techniques.

In conclusion, the paper decisively contributes to the ongoing efforts to make LLMs more reliable through nuanced and calibrated expressions of uncertainty, positioning itself as a significant step toward more trustworthy AI systems.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Arslan Chaudhry (15 papers)
Sridhar Thiagarajan (4 papers)
Dilan Gorur (10 papers)

Citations (2)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/gm8xx8/status/1836585951323185614