CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing (2305.11738v4)

Published 19 May 2023 in cs.CL and cs.AI

Abstract: Recent developments in LLMs have been impressive. However, these models sometimes show inconsistencies and problematic behavior, such as hallucinating facts, generating flawed code, or creating offensive and toxic content. Unlike these models, humans typically utilize external tools to cross-check and refine their initial content, like using a search engine for fact-checking, or a code interpreter for debugging. Inspired by this observation, we introduce a framework called CRITIC that allows LLMs, which are essentially "black boxes" to validate and progressively amend their own outputs in a manner similar to human interaction with tools. More specifically, starting with an initial output, CRITIC interacts with appropriate tools to evaluate certain aspects of the text, and then revises the output based on the feedback obtained during this validation process. Comprehensive evaluations involving free-form question answering, mathematical program synthesis, and toxicity reduction demonstrate that CRITIC consistently enhances the performance of LLMs. Meanwhile, our research highlights the crucial importance of external feedback in promoting the ongoing self-improvement of LLMs.

PDF Abstract

An Analysis of the CRITIC Framework for Self-Correcting LLMs

The paper "CRITIC: LLMs Can Self-Correct with Tool-Interactive Critiquing" addresses a significant challenge within the domain of LLMs—the propensity for these models to generate erroneous, fictitious, or harmful content. This research advances the capability of LLMs by introducing a structured framework, CRITIC, which enables these models to engage in a self-correction process by interacting with external tools.

LLMs are increasingly deployed across various applications ranging from conversational agents to programming assistants (e.g., ChatGPT). These applications often suffer from issues like generating inaccurate information, erroneous code, and offensive outputs due to intrinsic limitations in the models. To address these issues, the authors propose CRITIC, a novel method extending the traditional behavior of LLMs from one-time "generate" models to iterative "generate, verify, and correct" frameworks. Inspired by how humans reference tools for fact-checking or debugging, CRITIC assimilates an LLM's outputs with validation processes enabled by external tools.

The CRITIC framework delineates its methodology into clear phases: (1) initial output generation based on model-internal knowledge, followed by (2) tool-interactive critique which involves querying an external tool—such as an online search or a code interpreter—to receive feedback on specific attributes of the generated output, and finally (3) corrective iteration which refines the output using the feedback from step two. This iterative process ensures that the final output achieves higher quality and reliability.

The innovation in CRITIC is evaluated through several proof-of-concept tasks comprising free-form question answering, mathematical program synthesis, and toxicity reduction. Performance metrics such as enhanced F1 scores and reduced toxicity probabilities underscore the efficacy of CRITIC. The experiments consist of using different LLM variants including ChatGPT and LLaMA-2 models, thereby demonstrating the framework's adaptability across model architectures.

Numerically, CRITIC shows significant improvement over predecessor models, consistent across varied tasks. For instance, when applied to ChatGPT in free-form QA, CRITIC enhances F1 scores by 7.7, demonstrating its potential in producing more truthful outputs. Additionally, toxicity rates in content generation reduce substantially when CRITIC is applied. These results not only propose a more reliable method for LLM output generation but also illustrate the critical role external validation tools play in the self-correction process.

The CRITIC framework highlights a shift in paradigm from passive reliance on LLMs to an interactive and iterative process, borrowing critical evaluation techniques from human cognitive tasks. This approach contributes theoretically to the concept of machine-based self-improvement and practically to producing better-quality machine-generated content. It emphasizes the potential for future improvements in AI, particularly regarding content reliability and safety.

The implications for AI advancement with CRITIC are profound. They suggest that future AI systems could autonomously enhance their output quality by constantly learning from tool interactions, eliminating a significant portion of the manual intervention currently required. This may lead to the evolution of more autonomous AI systems equipped with a higher degree of reliability and ethical performance.

In conclusion, CRITIC establishes a framework for addressing the shortcomings of current LLMs without needing additional training, highlighting the need for LLMs to engage effectively with verification tools. This paper exemplifies a significant step towards making AI more trustworthy, bridging gaps in current LLM-based systems, and setting a new standard for future research focused on AI model alignment and precision.