Mitigating LLM Hallucinations via Conformal Abstention (2405.01563v1)

Published 4 Apr 2024 in cs.LG, cs.AI, and cs.CL

Abstract: We develop a principled procedure for determining when a LLM should abstain from responding (e.g., by saying "I don't know") in a general domain, instead of resorting to possibly "hallucinating" a non-sensical or incorrect answer. Building on earlier approaches that use self-consistency as a more reliable measure of model confidence, we propose using the LLM itself to self-evaluate the similarity between each of its sampled responses for a given query. We then further leverage conformal prediction techniques to develop an abstention procedure that benefits from rigorous theoretical guarantees on the hallucination rate (error rate). Experimentally, our resulting conformal abstention method reliably bounds the hallucination rate on various closed-book, open-domain generative question answering datasets, while also maintaining a significantly less conservative abstention rate on a dataset with long responses (Temporal Sequences) compared to baselines using log-probability scores to quantify uncertainty, while achieveing comparable performance on a dataset with short answers (TriviaQA). To evaluate the experiments automatically, one needs to determine if two responses are equivalent given a question. Following standard practice, we use a thresholded similarity function to determine if two responses match, but also provide a method for calibrating the threshold based on conformal prediction, with theoretical guarantees on the accuracy of the match prediction, which might be of independent interest.

PDF Abstract

Mitigating LLM Hallucinations via Conformal Abstention

The paper "Mitigating LLM Hallucinations via Conformal Abstention" addresses a critical issue in the deployment of LLMs: hallucinations. These occur when LLMs generate incorrect or nonsensical answers with undue confidence. The authors propose a principled procedure to determine when an LLM should abstain from answering a question, thus reducing the risk of hallucinations.

Methods and Theoretical Framework

The paper builds upon conformal prediction techniques to develop an abstention mechanism with robust theoretical guarantees. The central idea involves using LLM self-consistency as a measure of output reliability. By sampling multiple responses to a query from the LLM and evaluating their similarity, the process aims to identify situations where the model is prone to hallucinate. Should the similarity across responses fall below a calibrated threshold, the model opts to abstain rather than risk a potentially incorrect response.

The authors leverage the framework of Conformal Risk Control (CRC) to ensure that the rate of hallucinations – defined as incorrect or nonsensical answers – remains below a certain statistical bound. The innovation lies in formulating an abstention procedure that minimizes the abstention rate while controlling the hallucination risk through rigorous statistical guarantees.

Several methods for determining agreement among LLM responses were explored. Particularly interesting is the use of well-engineered prompts for the LLM to assess the similarity of its responses contextually. This approach is coupled with statistical techniques for calibrating the similarity threshold to control the abstention policy effectively.

Experimental Results and Evaluations

Empirical experiments were conducted on various datasets, including closed-book open-domain question answering datasets like Temporal Sequences and TriviaQA. The paper reports that the conformal abstention method reliably controls the hallucination rate and maintains a competitively low abstention rate compared to other baseline methods. For example, on datasets with long responses, the proposed method significantly outperformed approaches based on log-probability scores.

Importantly, the paper highlights the superiority of using LLM self-prompting to gauge response consistency over traditional log-probability approaches. The experiments confirm that the self-evaluation capability of LLMs, when calibrated properly, provides a precise mechanism to govern the abstention policy.

Theoretical and Practical Implications

The authors' contribution is notable in both theoretical and practical contexts. Theoretically, the integration of conformal prediction with LLM response evaluation fills a critical gap in the reliability of LLM outputs, ensuring that the rate of erroneous outputs can be bounded with high statistical confidence. Practically, this provides an operational framework for deploying LLMs in high-stakes applications where hallucinations could carry significant risks.

Future Directions

The paper opens several avenues for future research. Expanding the methodology to adapt dynamically to varying contexts or question types could enhance the adaptability and performance of conformal abstention. Additionally, further refinement of self-prompting techniques and exploration of ensemble methods for similarity assessment could provide more granular control over LLM predictions.

In conclusion, the paper presents a robust and theoretically grounded approach to mitigate hallucinations in LLMs through conformal abstention. The results not only showcase the efficacy of the proposed method but also provide a framework that underscores the importance of reliable LLM deployment in real-world applications.