Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic

Published 29 Aug 2024 in cs.CL | (2408.16326v3)

Abstract: Self-critic has become a crucial mechanism for enhancing the reasoning performance of LLMs. However, current approaches mainly involve basic prompts for intuitive instance-level feedback, which resembles System-1 processes and limits the reasoning capabilities. Moreover, there is a lack of in-depth investigations into the relationship between LLM's ability to criticize and its task-solving performance. To address these issues, we propose Critic-CoT, a novel framework that pushes LLMs toward System-2-like critic capability. Through a step-wise CoT reasoning paradigm and the automatic construction of distant-supervision data without human annotation, Critic-CoT enables LLMs to engage in slow, analytic self-critique and refinement, thereby improving their reasoning abilities. Experiments on GSM8K and MATH demonstrate that our enhanced model significantly boosts task-solving performance by filtering out invalid solutions or iterative refinement. Furthermore, we investigate the intrinsic correlation between critique and task-solving abilities within LLMs, discovering that these abilities can mutually reinforce each other rather than conflict.

Abstract PDF HTML Upgrade to Chat

Authors (10)

Citations (2)

View on Semantic Scholar

Summary

The paper introduces Critic-CoT, a framework that enhances LLM reasoning by applying system-2-like critique to refine outputs.
It employs a step-by-step chain-of-thought strategy to systematically assess and improve each part of the reasoning process.
Distant-supervision data construction enables scalable training without human annotations, yielding robust results on GSM8K and MATH.

The paper "Critic-CoT: Boosting the reasoning abilities of LLM via Chain-of-thoughts Critic" introduces a framework named Critic-CoT aimed at enhancing the reasoning capabilities of LLMs by leveraging a Chain-of-Thoughts (CoT) methodology. The authors identify a critical shortcoming in current self-criticism mechanisms, which typically rely on simplistic prompting strategies without additional training, leading to unsatisfactory accuracy. Additionally, the research explores the relationship between an LLM's ability to critique its outputs and its overall task-solving capabilities.

To address these challenges, the Critic-CoT framework revamps the reasoning process of LLMs through a structured approach:

System-2-like Critic Capability: The framework is designed to emulate higher-order cognitive reasoning (similar to System-2 thinking), enabling the model to assess and enhance its outputs more effectively.
Step-wise CoT Reasoning: By using a step-by-step reasoning progression, the model can better critique each part of its thought process, allowing for more refined conclusions.
Distant-supervision Data Construction: This novel data construction method enriches the training process without needing human annotations, thereby maintaining scalability and reducing human intervention.

The experiments conducted on datasets like GSM8K and MATH showcase the framework’s ability to filter out invalid solutions or refine iterative processes, which significantly bolsters the model’s task-solving performance. The results suggest that training models on critique and refinement mechanisms alone can enhance their generative abilities.

The research implies that such structured critic processes could be vital to further advancements in reasoning and critique skills in LLMs, potentially influencing future innovations in artificial intelligence reasoning methodologies. The authors hope that their work provides insights into optimizing LLM performance through improved reasoning frameworks.

Markdown Report Issue