Internal Consistency and Self-Feedback in Large Language Models: A Survey (2407.14507v3)

Published 19 Jul 2024 in cs.CL

Abstract: LLMs often exhibit deficient reasoning or generate hallucinations. To address these, studies prefixed with "Self-" such as Self-Consistency, Self-Improve, and Self-Refine have been initiated. They share a commonality: involving LLMs evaluating and updating themselves. Nonetheless, these efforts lack a unified perspective on summarization, as existing surveys predominantly focus on categorization. In this paper, we use a unified perspective of internal consistency, offering explanations for reasoning deficiencies and hallucinations. Internal consistency refers to the consistency in expressions among LLMs' latent, decoding, or response layers based on sampling methodologies. Then, we introduce an effective theoretical framework capable of mining internal consistency, named Self-Feedback. This framework consists of two modules: Self-Evaluation and Self-Update. The former captures internal consistency signals, while the latter leverages the signals to enhance either the model's response or the model itself. This framework has been employed in numerous studies. We systematically classify these studies by tasks and lines of work; summarize relevant evaluation methods and benchmarks; and delve into the concern, "Does Self-Feedback Really Work?" We also propose several critical viewpoints, including the "Hourglass Evolution of Internal Consistency", "Consistency Is (Almost) Correctness" hypothesis, and "The Paradox of Latent and Explicit Reasoning". The relevant resources are open-sourced at https://github.com/IAAR-Shanghai/ICSFSurvey.

PDF HTML Abstract

Internal Consistency and Self-Feedback in LLMs: A Survey

LLMs have demonstrated extensive capabilities across various natural language tasks, yet they frequently exhibit issues related to reasoning and hallucination. Such deficiencies highlight the complex challenge of maintaining internal consistency within LLMs. In their exhaustive survey, Liang et al. propose a theoretical framework, Internal Consistency, to succinctly address these challenges and introduce the concept of Self-Feedback as a strategy to enhance model capabilities through iterative self-improvement.

Internal Consistency and its Evaluation

Internal Consistency is defined as the measure of coherence among the LLM’s latent, decoding, and response layers. The proposed framework hinges on improving consistency at all three levels:

Response Consistency: Ensuring uniformity in responses across similar queries.
Decoding Consistency: Stability in token selection during the decoding process.
Latent Consistency: Reliability of internal states and attention mechanisms.

The authors reveal the "Hourglass Evolution of Internal Consistency," illustrating how an LLM's consistency varies across different layers. Latent states near the bottom layers exhibit randomness in responses, gradually improving through intermediate and final layers, but diverging again at the response generation stage. This phenomenon underscores the significant role internal mechanisms play in consistency and the challenges at each stage of processing within LLMs.

Self-Feedback Framework

The Self-Feedback framework proposed by Liang et al. involves two core modules: Self-Evaluation and Self-Update. The LLM initially performs Self-Evaluation by analyzing its outputs, and then uses the generated feedback to refine its responses or update its internal parameters. This self-improving feedback loop is fundamental to addressing issues of reasoning and hallucination.

Consistency Signal Acquisition

A crucial part of the Self-Feedback framework is the acquisition of consistency signals. The authors categorize these methods into six primary lines of work:

Uncertainty Estimation: Estimating the model's uncertainty in its outputs to guide refinement.
Confidence Estimation: Quantifying the model's confidence in its responses.
Hallucination Detection: Identifying and mitigating unfaithful or incorrect content generated by the model.
Verbal Critiquing: Allowing the LLM to generate critiques of its own outputs to facilitate iterative improvement.
Contrastive Optimization: Optimizing the model by comparing different outputs and selecting the best.
External Feedback: Leveraging external tools or more robust models to provide feedback on generated content.

Each method plays a distinct role in refining model outputs and enhancing internal consistency.

Critical Viewpoints and Future Directions

The survey introduces several critical viewpoints such as the "Consistency Is (Almost) Correctness" hypothesis, which posits that increasing a model's internal consistency generally results in improved overall correctness. This is predicated on the assumption that pre-training corpora are predominantly aligned with correct world knowledge.

Despite advancements, several challenges and future directions emerge from the survey:

Textual Self-Awareness: Improving LLMs' ability to express their degrees of certainty and uncertainty in textual form.
The Reasoning Paradox: Balancing latent and explicit reasoning to optimize reasoning efficiency without disrupting inference.
Deeper Investigation: Moving beyond response-level improvements to explore decoding and latent states comprehensively.
Unified Perspective: Integrating improvements across response, decoding, and latent layers to form a cohesive improvement strategy.
Comprehensive Evaluation: Establishing robust evaluation metrics and benchmarks to holistically assess LLM capabilities and improvements.

Implications and Conclusions

The implications of refining LLMs through Internal Consistency Mining are manifold. Enhanced consistency not only improves the reliability of LLMs in various applications but also strengthens their alignment with human-like reasoning capabilities. The Self-Feedback framework represents a significant forward step in the iterative improvement of LLMs by leveraging their own feedback mechanisms.

In conclusion, this survey by Liang et al. offers a comprehensive and structured approach to addressing the core issues of reasoning and hallucination in LLMs through the lens of internal consistency. By systematically categorizing and evaluating various methods, the paper lays a solid foundation for ongoing and future research in enhancing the robustness and reliability of LLMs.