SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales (2405.20974v3)

Published 31 May 2024 in cs.CL, cs.AI, and cs.LG

Abstract: LLMs often generate inaccurate or fabricated information and generally fail to indicate their confidence, which limits their broader applications. Previous work elicits confidence from LLMs by direct or self-consistency prompting, or constructing specific datasets for supervised finetuning. The prompting-based approaches have inferior performance, and the training-based approaches are limited to binary or inaccurate group-level confidence estimates. In this work, we present the advanced SaySelf, a training framework that teaches LLMs to express more accurate fine-grained confidence estimates. In addition, beyond the confidence scores, SaySelf initiates the process of directing LLMs to produce self-reflective rationales that clearly identify gaps in their parametric knowledge and explain their uncertainty. This is achieved by using an LLM to automatically summarize the uncertainties in specific knowledge via natural language. The summarization is based on the analysis of the inconsistency in multiple sampled reasoning chains, and the resulting data is utilized for supervised fine-tuning. Moreover, we utilize reinforcement learning with a meticulously crafted reward function to calibrate the confidence estimates, motivating LLMs to deliver accurate, high-confidence predictions and to penalize overconfidence in erroneous outputs. Experimental results in both in-distribution and out-of-distribution datasets demonstrate the effectiveness of SaySelf in reducing the confidence calibration error and maintaining the task performance. We show that the generated self-reflective rationales are reasonable and can further contribute to the calibration. The code is made public at https://github.com/xu1868/SaySelf.

PDF HTML Abstract

Analyzing "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"

Introduction

This essay provides a comprehensive overview of "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales," a paper focused on enhancing the accuracy and interpretability of LLMs in expressing their confidence levels. Current LLMs often produce hallucinated information and display a reluctance to indicate their uncertainty, leading to limitations in their application scope. The primary contribution of this paper is the introduction of the SaySelf framework, which aims to address these concerns by guiding LLMs to generate fine-grained confidence estimates and self-reflective rationales.

Confidence and Rationales in LLMs

SaySelf is designed to tackle the challenges inherent in existing methods that extract confidence from LLMs. Traditional prompting-based techniques often suffer from poor calibration performance and increased inference latency. Training-based approaches are typically limited to providing binary or group-level confidence estimates, which fail to capture the nuanced nature of an LLM’s confidence. The framework proposed in this paper employs a two-pronged strategy:

Supervised Fine-Tuning: This stage involves constructing a model-specific dataset that includes self-reflective rationales and confidence estimates. The dataset is created by analyzing the inconsistencies in multiple reasoning chains sampled from an LLM and then summarizing the uncertainties in natural language.
Reinforcement Learning (RL) from Task Supervision: Calibration of the confidence estimates is achieved using reinforcement learning with a specially devised reward function that penalizes overconfidence in incorrect predictions while promoting accurate, high-confidence outputs.

Methodology

The methodology is divided into two essential stages:

Supervised Fine-Tuning

Here, the aim is to construct a dataset where each sample consists of a question, an answer with the reasoning chain, a self-reflective rationale, and a confidence estimate. This is carried out in several steps:

Sampling and Clustering: Multiple responses are sampled for each question. Subsequently, clustering is performed based on the semantic similarity of these responses to reduce redundancy and retain representative responses.
Confidence Estimation: This involves examining the selected response’s correctness relative to a gold standard answer and computing a fine-grained confidence estimate.
Rationale Generation: An off-the-shelf LLM (e.g., GPT-4) is used to summarize the uncertainties in the responses, providing explanations about the model's knowledge gaps.

Reinforcement Learning from Task Supervision

Given the limitations of supervised fine-tuning in generating varied confidence levels, reinforcement learning is employed to fine-tune the LLMs further. A reward function is defined to:

Incentivize correct, high-confidence predictions.
Penalize incorrect, overconfident outputs.

The Proximal Policy Optimization (PPO) algorithm is utilized to optimize this process.

Results and Evaluation

Confidence Calibration Performance

SaySelf demonstrates significant improvements in both Expected Calibration Error (ECE) and AUROC scores across several datasets, including HotpotQA, TruthfulQA, StrategyQA, FEVER, HaluEval, and ParaRel. Specifically, it consistently reduces calibration errors and improves the capability of LLMs to discriminate between correct and incorrect responses. These evaluations underscore the framework's robustness and general applicability.

Task Performance

In terms of task accuracy, SaySelf maintains performance levels comparable to existing models, ensuring that seeking to improve confidence calibration does not compromise the quality of the underlying tasks.

Faithfulness of Self-Reflective Rationales

A unique contribution of this work is the evaluation of the rationales' faithfulness. SaySelf's generated rationales effectively capture the internal uncertainties within models, as validated by GPT-4 evaluations, showcasing high coherence and reasonable explanations.

Ablation Study

The paper also presents a thorough ablation paper to validate the contributions of its various components, including supervised fine-tuning, the role of rationales, and reinforcement learning with a specific reward function. All components have been shown to significantly impact the calibration performance, supporting the design choices made in the SaySelf framework.

Implications and Future Directions

The implications of this work span both theoretical and practical domains. The framework promotes trustworthiness in AI by providing transparent, well-calibrated confidence estimates and detailed self-reflective rationales. These capabilities are crucial for applications where model reliability is paramount, such as in medical diagnosis or legal advisory systems.

Future research could explore enhancing training protocols for LLMs using SaySelf, which may include proactive learning algorithms aimed at continuous improvement of LLMs through human interaction. Investigation into more complex reward functions or alternative RL algorithms could also further enhance calibration performance.

Conclusion

The paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales" addresses a critical gap in the capabilities of LLMs. By employing a combination of supervised fine-tuning and reinforcement learning, SaySelf provides a robust framework for generating accurate confidence estimates and insightful rationales, enhancing both the interpretability and reliability of LLMs.

With the promising outcomes presented, SaySelf sets a notable example for future explorations aimed at improving the robustness and transparency of artificial intelligence systems.