LLMs can learn self-restraint through iterative self-reflection (2405.13022v2)

Published 15 May 2024 in cs.CL and cs.LG

Abstract: In order to be deployed safely, LLMs must be capable of dynamically adapting their behavior based on their level of knowledge and uncertainty associated with specific topics. This adaptive behavior, which we refer to as self-restraint, is non-trivial to teach since it depends on the internal knowledge of an LLM. By default, LLMs are trained to maximize the next token likelihood, which does not teach the model to modulate its answer based on its level of uncertainty. In order to learn self-restraint, we devise a utility function that can encourage the model to produce responses only when it is confident in them. This utility function can be used to score generation of different length and abstention. To optimize this function, we introduce ReSearch, a process of "self-reflection" consisting of iterative self-prompting and self-evaluation. We use the ReSearch algorithm to generate synthetic data on which we finetune our models. Compared to their original versions, our resulting models generate fewer \emph{hallucinations} overall at no additional inference cost, for both known and unknown topics, as the model learns to selectively restrain itself. In addition, our method elegantly incorporates the ability to abstain by augmenting the samples generated by the model during the search procedure with an answer expressing abstention.

PDF Abstract

Essay: "LLMs can learn self-restraint through iterative self-reflection"

The paper "LLMs can learn self-restraint through iterative self-reflection" presents a novel approach to enhancing the reliability and accuracy of LLMs by facilitating their ability to exhibit self-restraint. The authors, Alexandre Pich, Aristides Milios, Dzmitry Bahdanau, and Chris Pal, propose a framework that not only reduces hallucinations in LLM outputs but also introduces the concept of self-restraint—an LLM's ability to modulate its responses based on its internal knowledge and abstain from answering when necessary.

Key Contributions

The primary contribution of this paper is the development of a utility function and the ReSearch algorithm, which combines iterative self-reflection and synthetic data generation to teach LLMs self-restraint. The authors introduce a novel utility function designed to encourage LLMs to produce responses only when they are confident, effectively reducing false claims while maximizing the number of true claims. This utility function assigns positive scores to true claims and penalizes false claims, with a specific threshold to decide when abstention is more favorable than providing an inaccurate answer.

Methodology

The ReSearch algorithm is central to this paper's contribution. It involves an iterative process that combines self-evaluation and self-prompting to refine LLM outputs iteratively:

Generation: The model generates multiple potential responses to a query.
Self-evaluation: These responses are evaluated using a self-consistency measure, estimating the likelihood of each claim being true.
Self-prompting: The model self-prompts using claims deemed likely to be true, improving the input conditions for subsequent iterations.

Finally, by evaluating all generated samples and incorporating an abstention response, ReSearch produces a synthetic dataset that can be used to fine-tune LLMs, teaching them to exhibit self-restraint without requiring external references.

Results and Implications

The experimental results highlight that models trained on data generated by ReSearch outperform those using existing methods like Chain-of-Verification and naive search techniques. The percentage of false claims was reduced dramatically, particularly for less well-known topics. Models demonstrated the ability to abstain from answering when confident knowledge was lacking, an important step towards safer AI deployment in real-world scenarios.

The authors also evaluated their approach on external benchmarks like the FActScore and demonstrated superior performance over existing models, reinforcing the utility of ReSearch in improving factuality and reliability.

Future Directions

The implications for AI development are significant—this approach provides a structured method to instill a level of judgment in LLMs, potentially facilitating their use in applications where accuracy and reliability are critical, such as medical and legal domains.

Future research could focus on integrating retrieval-augmented generation to further enhance model reliability by combining internal knowledge processing with external information retrieval. Additionally, exploring more refined utility functions and optimization strategies could broaden the applicability of ReSearch to other domains and tasks.

Conclusion

This paper provides a compelling methodology for improving LLM reliability through self-restraint and self-reflection. By integrating utility-driven reinforcement learning techniques, this approach opens new pathways for developing LLMs that can autonomously regulate their output quality, paving the way for more responsible AI deployment in sensitive applications. The advancements presented here are a significant step towards achieving more trustworthy and accurate LLMs without over-reliance on human intervention or external data sources.