Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification (2410.05318v1)

Published 5 Oct 2024 in cs.LG and cs.AI

Abstract: Despite significant advancements in the general capability of LLMs, they continue to struggle with consistent and accurate reasoning, especially in complex tasks such as mathematical and code reasoning. One key limitation is that LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors, which hampers their ability to reliably verify and rank outputs. To address this, we scale up the inference-time computation by generating multiple reasoning paths and employing verifiers to assess and rank the generated outputs by correctness. To facilitate this, we introduce a comprehensive dataset consisting of correct and incorrect solutions for math and code tasks, generated by multiple LLMs. This diverse set of solutions enables verifiers to more effectively distinguish and rank correct answers from erroneous outputs. The training methods for building verifiers were selected based on an extensive comparison of existing approaches. Moreover, to leverage the unique strengths of different reasoning strategies, we propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification. CoT provides a clear, step-by-step reasoning process that enhances interpretability, while PoT, being executable, offers a precise and error-sensitive validation mechanism. By taking both of their strengths, our approach significantly improves the accuracy and reliability of reasoning verification. Our verifiers, Math-Rev and Code-Rev, demonstrate substantial performance gains to existing LLMs, achieving state-of-the-art results on benchmarks such as GSM8k and MATH and even outperforming GPT-4o with Qwen-72B-Instruct as the reasoner.

PDF HTML Abstract

Enhancing LLM Reasoning with Collaborative Verification

The paper, "Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification," explores a notable challenge in LLMs: the capacity for consistent and accurate reasoning, particularly in complex mathematical and coding tasks. The authors identify that LLMs, despite significant advancements, remain constrained largely due to their training predominantly on correct solutions, which impedes their proficiency in detecting and learning from erroneous reasoning paths.

Methodology and Approach

The authors propose an innovative solution termed "Collaborative Verification," which scales inference-time computation by generating multiple reasoning paths. This method introduces verifiers that assess and rank the generated solutions based on their correctness. A pivotal aspect of this approach is the compilation of a comprehensive dataset consisting of both accurate and inaccurate solutions, crafted by various LLMs, facilitating verifiers in effectively distinguishing correct outputs from flawed ones.

The training of verifiers is central to the paper’s methodology. Through an exhaustive comparative analysis of existing techniques, preference tuning—specifically SimPO—is selected as the most appropriate method for training these verifiers. This choice circumvents the additional parameters introduced by outcome reward models (ORMs), fostering alignment with LLMs' inherent generative capabilities.

Another novel contribution is the collaborative integration of Chain-of-Thought (CoT) and Program-of-Thought (PoT) methodologies. CoT enhances interpretability through step-by-step reasoning, while PoT offers executable precision and error sensitivity. Through this dual-strategy, named CoTnPoT, the paper achieves significant performance improvements in reasoning verification.

Empirical Results

The verifiers, Math-Rev and Code-Rev, exhibited substantial gains across benchmarks such as GSM8k and MATH. Notably, Math-Rev improved the performance beyond previous state-of-the-art models, even exceeding the capabilities of GPT-4o when paired with Qwen-72B-Instruct. Performance metrics were enhanced significantly through inference techniques, leveraging multiple sampled solutions and scoring via verifiers.

A crucial finding is the verified comparability of in-distribution (ID) and out-of-distribution (OOD) task performance improvements, showcasing the generalizability of methods used across diverse LLMs and datasets. Additionally, CoTnPoT and verifier enhancements facilitate effective filtering of solutions, particularly in weaker LLMs, thus providing robust accuracy improvements.

Implications and Future Directions

This research presents several critical advancements for LLM reasoning. Practically, the scaling of inference computation and integration of multifaceted reasoning approaches contribute significantly to improving the reliability and precision of LLM-generated solutions. Theoretically, it underscores the importance of learning from errors and leveraging collaborative verification methodologies in augmenting LLM capabilities.

Looking forward, exploring more sophisticated verifier training techniques, particularly integrating process reward models (PRMs) for more granular feedback, would be beneficial. Moreover, the expansion of datasets and inclusion of diverse reasoning tasks could further optimize verifier training and application.

In summary, this paper makes considerable strides in addressing a long-standing challenge in LLM reasoning by fostering sophisticated verification frameworks and interaction of reasoning strategies. This advancement holds promise not only for improving performance metrics but also for expanding the potential applications of LLMs in complex reasoning domains.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Zhenwen Liang (22 papers)
Ye Liu (153 papers)
Tong Niu (25 papers)
Xiangliang Zhang (131 papers)
Yingbo Zhou (81 papers)
Semih Yavuz (43 papers)

Citations (3)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/fly51fly/status/1845576553528516670