Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains (2501.05707v1)

Published 10 Jan 2025 in cs.CL, cs.AI, and cs.LG

Abstract: LLMs have achieved remarkable performance in recent years but are fundamentally limited by the underlying training data. To improve models beyond the training data, recent works have explored how LLMs can be used to generate synthetic data for autonomous self-improvement. However, successive steps of self-improvement can reach a point of diminishing returns. In this work, we propose a complementary approach towards self-improvement where finetuning is applied to a multiagent society of LLMs. A group of LLMs, all starting from the same base model, are independently specialized by updating each one using data generated through multiagent interactions among the models. By training each model on independent sets of data, we illustrate how this approach enables specialization across models and diversification over the set of models. As a result, our overall system is able to preserve diverse reasoning chains and autonomously improve over many more rounds of fine-tuning than single-agent self-improvement methods. We quantitatively illustrate the efficacy of the approach across a wide suite of reasoning tasks.

PDF Abstract

Multiagent Finetuning: Advancing Self-Improvement in LLMs

In the ongoing pursuit of enhancing the capabilities of LLMs, the concept of self-improvement has emerged as a promising approach. These models, while proficient, are ultimately bounded by their training data. The paper "Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains" presents a novel strategy to overcome this limitation through a multiagent system that leverages diverse reasoning processes. This method represents an advancement over traditional self-improvement, offering a systematic way to maintain performance gains across multiple rounds of fine-tuning.

Summary of Methodology

The authors propose a multiagent finetuning approach applied to a society of LLMs, all originating from the same base model. Each model within this society is independently specialized through data generated by interactions among the models themselves. This strategy diverges from the traditional single-agent self-improvement which tends to plateau quickly due to limited data diversity. The process can be summarized in several key steps:

Initialization and Model Role Assignment: A set of LLMs are initialized from a base model. These models are divided into generation agents, which produce responses, and critic agents, which refine these responses.
Data Generation via Multiagent Interaction: Using a multiagent debate framework, models iteratively generate and refine responses. This debate process generates a diverse set of reasoning chains, which act as training data. Each agent provides a unique perspective, promoting specialization.
Finetuning: Each model is fine-tuned on a distinct subset of data corresponding to its generated outputs. This encourages specialization, ensuring that each model develops distinct reasoning strategies.
Iterative Self-Improvement: The process iterates, with models continually fine-tuned on newly generated data, further enhancing specialization and collective diversity in reasoning.

Experimental Results

The efficacy of this multiagent finetuning approach is demonstrated across various reasoning tasks. Notably, the system shows sustained improvements over multiple finetuning rounds, unlike traditional methods where performance gains tend to saturate. In empirical evaluations on datasets such as MATH and GSM, the multiagent system outperforms both the single-agent counterparts and other baselines like STaR and debate methods. This indicates the method's ability to maintain and even increase diversity and reasoning capabilities without additional external supervision or data.

Further, the system exhibits zero-shot generalization, wherein models fine-tuned on one dataset (e.g., MATH) perform impressively on another dataset (e.g., GSM) without additional training. This suggests that the models develop a more generalized understanding, enhancing their applicability.

Theoretical and Practical Implications

Theoretically, this work suggests that diversity in training data—achieved through multiagent generative approaches—can mitigate the limitations of homogeneous training sets. Practically, the approach offers a pathway to developing more adaptive and broadly applicable LLMs without the need for expensive and often restricted ground-truth annotations.

The inclusion of explicitly defined roles within the multiagent system (generative and critic roles) further introduces a structured method for deriving complex reasoning, potentially influencing future architectures and training methodologies in AI research.

Future Directions

While the paper focuses on LLMs, the underlying principles of multiagent finetuning could extend to other domains like robotics or autonomous systems, where diverse task handling is crucial. Future research might explore hybrid models that integrate human feedback with multiagent interaction, potentially offering a balance between autonomous learning and alignment with human values.

In conclusion, the strategy delineated in this paper represents a significant step toward fostering autonomous improvement in LLMs, promising more versatile applications and robust performance across diverse tasks. This multiagent approach not only augments current capabilities but also opens new avenues in the continuous enhancement of artificial intelligence systems.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Vighnesh Subramaniam (6 papers)
Yilun Du (113 papers)
Joshua B. Tenenbaum (257 papers)
Antonio Torralba (178 papers)
Shuang Li (203 papers)
Igor Mordatch (66 papers)

Related Papers

Find Related Papers

Tweets

https://twitter.com/fly51fly/status/1878918935548170692

https://twitter.com/garybasin/status/1881418114543620352

https://twitter.com/rohanpaul_ai/status/1880005401409909061

https://twitter.com/wavefnx/status/1880956651068612943

https://twitter.com/AntonDVP/status/1879084452175110306

https://twitter.com/davidberenstei/status/1879183298808754220