$\texttt{LM}^\texttt{2}$: A Simple Society of Language Models Solves Complex Reasoning (2404.02255v1)

Published 2 Apr 2024 in cs.CL and cs.AI

$$\texttt{LM}^\texttt{2}$: A Simple Society of Language Models Solves Complex Reasoning$

Abstract: Despite demonstrating emergent reasoning abilities, LLMs often lose track of complex, multi-step reasoning. Existing studies show that providing guidance via decomposing the original question into multiple subproblems elicits more robustness in LLM reasoning -- a decomposer generates the subproblems, and a solver solves each of these subproblems. However, these techniques fail to accommodate coordination between the decomposer and the solver modules (either in a single model or different specialized ones) -- the decomposer does not keep track of the ability of the solver to follow the decomposed reasoning. In this paper, we propose LM2 to address these challenges. LM2 modularizes the decomposition, solution, and verification into three different LLMs. The decomposer module identifies the key concepts necessary to solve the problem and generates step-by-step subquestions according to the reasoning requirement. The solver model generates the solution to the subproblems that are then checked by the verifier module; depending upon the feedback from the verifier, the reasoning context is constructed using the subproblems and the solutions. These models are trained to coordinate using policy learning. Exhaustive experimentation suggests the superiority of LM2 over existing methods on in- and out-domain reasoning problems, outperforming the best baselines by $8.1\%$ on MATH, $7.71\%$ on JEEBench, and $9.7\%$ on MedQA problems (code available at https://github.com/LCS2-IIITD/Language_Model_Multiplex).

PDF HTML Abstract

Improving Coordination Between Decomposer, Solver, and Verifier Models in LLMs for Complex Reasoning

Introduction to the Proposed LM Model

In the field of AI research focused on LLMs, the challenge of executing complex, multi-step reasoning tasks has been a significant hurdle. Recent contributions in this space have offered novel solutions, among which the integration of decomposer and solver models has shown considerable promise. However, these solutions often face limitations due to the lack of an effective feedback loop between the decomposing and solving phases of problem-solving. Addressing this void, the paper introduces a pioneering multi-LLM framework titled "LLM Multiplex" (LM) that modularly incorporates decomposition, solution, and verification tasks across three distinct LLMs.

Key Contributions of LM

Introduction of a Verifier Model: A verifier model checks the solutions provided by the solver model, introducing an additional layer of scrutiny that enriches the reasoning feedback loop.
Dynamic Coordination Through Policy Learning: The LM framework fine-tunes the interaction among its components (decomposer, solver, verifier) utilizing policy learning, a step that showcases a novel method of improving model coordination based on direct feedback.
Enhanced Performance on Complex Reasoning Tasks: The experimentation underscores LM's superiority over existing methods across in-domain and out-domain reasoning tasks, showcasing an improvement of 8.1% accuracy on the MATH dataset and notable gains on JEEBench and MedQA problems.

Methodological Innovations

The LM approach distinguishes itself through several methodological innovations:

The decomposer identifies necessary concepts and generates step-by-step sub-questions, guided by the capabilities and feedback of the solver and verifier.
The solver, powered by an unmodified GPT-3.5 model, focuses on generating responses to these sub-questions.
The verifier, fine-tuned on distinct classifications of potential answer errors, provides nuanced feedback essential for refining the overall reasoning process.

Training and Results

The paper details an exhaustive training regimen involving the fine-tuning of the LLaMA-2 model used for both the decomposer and the verifier components. A notable aspect of the training process is the innovative use of Proximal Policy Optimization for integrating feedback, a method contributing significantly to the model's robustness in handling complex reasoning sequences.

Empirical results reveal LM's effectiveness, with marked improvements over baseline models on several benchmark datasets. Most impressively, despite being fine-tuned on mathematical reasoning problems, LM displays exceptional generalizability across diverse reasoning tasks, including in areas such as chemistry and medical question answering.

A Discussion on Implications and Future Directions

The introduction of LM sets a new benchmark in the pursuit of more effective multi-step reasoning within LLMs. The model's ability to dynamically adjust the decomposition of questions based on ongoing solver performance and verifier feedback opens new avenues for research into LLMs' capabilities in handling intricate reasoning tasks.

Looking forward, the LM framework's modularity hints at the potential for incorporating more specialized models for each of its components, possibly enabling even finer-grained reasoning abilities. Moreover, the structured feedback mechanism introduced by the verifier model suggests intriguing research directions in improving error detection and correction methodologies within LLMs.

In conclusion, the LLM Multiplex framework represents a significant step forward in the development of LLMs capable of nuanced and complex reasoning across varied domains. With its innovative structure and promising initial results, LM paves the way for future advances in AI's reasoning capabilities, offering a blueprint for enhancing the interplay between decomposing, solving, and verifying components of LLM-based reasoning systems.