Papers
Topics
Authors
Recent
Search
2000 character limit reached

Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains

Published 10 Jan 2025 in cs.CL, cs.AI, and cs.LG | (2501.05707v2)

Abstract: LLMs have achieved remarkable performance in recent years but are fundamentally limited by the underlying training data. To improve models beyond the training data, recent works have explored how LLMs can be used to generate synthetic data for autonomous self-improvement. However, successive steps of self-improvement can reach a point of diminishing returns. In this work, we propose a complementary approach towards self-improvement where finetuning is applied to a multiagent society of LLMs. A group of LLMs, all starting from the same base model, are independently specialized by updating each one using data generated through multiagent interactions among the models. By training each model on independent sets of data, we illustrate how this approach enables specialization across models and diversification over the set of models. As a result, our overall system is able to preserve diverse reasoning chains and autonomously improve over many more rounds of fine-tuning than single-agent self-improvement methods. We quantitatively illustrate the efficacy of the approach across a wide suite of reasoning tasks.

Summary

  • The paper introduces a multiagent finetuning framework that utilizes collaborative debate among specialized language models to generate refined reasoning outputs.
  • It employs generation and critic agents to iteratively fine-tune models, yielding significant performance gains on datasets like MATH and GSM.
  • The method reduces reliance on static datasets by enabling self-generated, diversified data for continuous model improvement.

Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains

Introduction

The paper "Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains" (2501.05707) introduces a novel methodology aimed at enhancing the performance of LLMs. It addresses the limitations imposed by the static nature of the training data traditionally used in LLM development, proposing a dynamic multiagent framework that leverages self-generated data for iterative refinement. This multiagent system enables specialization and diversification beyond what is achievable with single-agent finetuning, offering promising advances in reasoning task performance.

Methodology: Multiagent Finetuning

The proposed method operates on a multiagent society of LLMs derived from the same base model. Each agent is independently fine-tuned using data that emerges from multiagent interactions. This data-centric approach allows for the specialized tuning of models, fostering diverse reasoning chains that enhance task performance and enable self-improvement over successive rounds of finetuning.

Multiagent Debate Framework: The core mechanism involves a debate-based interaction among multiple agents. Initially, each agent offers a response to a given query, after which responses are exchanged, debated, and iteratively refined through majority consensus. The aim is to converge on the most accurate outputs, thus generating a robust finetuning dataset. Figure 1

Figure 1: Overview of Multiagent Finetuning, showcasing the debate and majority voting-based data creation process followed by specialized model finetuning.

Specialization of Agents: Two distinct roles are defined within the agent society: generation agents, which produce initial problem-solving attempts, and critic agents, tasked with evaluating and refining these solutions. This division encourages intricate feedback loops that benefit subsequent iterations of response generation and optimization.

Experimental Results

The methodology's efficacy is demonstrated through extensive testing across diverse reasoning tasks, including arithmetic and complex mathematical problem datasets like MATH and GSM. Results reflect significant performance uplifts compared to previously existing methods, particularly in environments necessitating advanced reasoning and problem-solving strategies. Figure 2

Figure 2: Multiagent finetuning significantly boosts reasoning performance over multiple rounds, as shown with the MATH dataset.

The paper reports that multitiered finetuning iterations continue to yield performance gains, contrasting sharply with the plateauing effects observed in mono-agent configurations. This iteration-based improvement is quantitatively validated, showcasing robustness across various model architectures, including both open-source and proprietary LLMs like GPT-3.5.

Implications and Future Directions

The introduction of multiagent finetuning heralds a shift towards more autonomous and self-sustaining model training paradigms, potentially alleviating the dependency on large, static datasets. By facilitating continual learning through synthetic data generation and agent specialization, this approach could redefine efficiency and scalability standards within the field.

Practical Applications: The framework's ability to generalize to entirely new datasets highlights its utility in real-world applications, offering LLMs a mechanism to adapt to evolving data landscapes without extensive retraining from external sources. Figure 3

Figure 3: Demonstrates the iterative finetuning improvements across MATH problem levels, highlighting the model's adaptability to varying difficulty levels.

Future Research: Exploration into integrating this methodology with other finetuning techniques, such as human-in-the-loop systems, presents an intriguing avenue for future enhancements. Additionally, leveraging this multiagent strategy could optimize various other machine learning domains, potentially extending beyond natural language processing.

Conclusion

Overall, the paper introduces a robust advancement in LLM training methodologies, emphasizing multiagent interactions as a pathway to achieving more intelligent and adaptable AI systems. The demonstrated improvements in both performance and diversity underscore the significant potential of this approach in overcoming the innate limitations of traditional model finetuning techniques.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Explain it Like I'm 14

Multiagent Finetuning: A Simple, Teen-Friendly Explanation

What is this paper about?

This paper is about teaching AI LLMs (like smart chatbots) to get better at hard reasoning tasks—especially math—by working as a team. Instead of training just one model over and over (which often stops helping after a while), the authors train a group of models that play different roles, learn from each other, and keep improving for many rounds.

What questions are the researchers trying to answer?

  • Can a team of AI models improve itself using its own generated practice problems and answers?
  • If the models take on different roles (like “writer” and “editor”), will they keep a variety of problem-solving styles instead of all sounding the same?
  • Will this teamwork help the AI get better for more rounds of training than usual?
  • Can the improved models solve new problems they haven’t practiced on?

How does their method work?

Think of it like a study group with writers and editors:

  • The team starts as several copies of the same base model.
  • Some copies are “generation agents” (the writers) who produce first-draft answers to questions.
  • Other copies are “critic agents” (the editors) who read what the writers said, compare different answers, explain what’s wrong or right, and suggest improvements.
  • The group has a short “debate”: writers answer, critics review and revise, and at the end the team picks the final answer by majority vote (the answer most agents agree on).

How they train the team:

  • Each writer model is trained on the specific correct answers it personally produced. This helps different writers develop their own strengths and styles, instead of all copying the same solution.
  • Each critic model is trained on examples that show:
    • How to fix a wrong first answer into a right final answer (learning to correct mistakes).
    • How to keep a right answer right throughout the debate (learning to stay accurate).
  • They repeat this process for multiple rounds. Each round produces new training data from the group’s debates, which then helps the models improve again.

Why this helps (in everyday terms):

  • If you only train one student and make them study their own old answers, they can get stuck in a rut. But if you train a team where each person practices their own best methods and also learns from editing others, the group can keep getting smarter without collapsing into one “style.”
  • The team keeps a diversity of reasoning paths—different ways to get to the right answer—which prevents the model from becoming narrow and repetitive.

What did they test and find?

They tested on:

  • Arithmetic (basic math expressions).
  • GSM (Grade School Math): step-by-step word problems.
  • MATH: harder, competition-style math problems.

Key findings:

  • The multiagent team method beat several strong baselines, including:
    • A single model answering alone.
    • Simple majority vote across models without training.
    • Multiagent debate without this special training.
    • Popular self-training methods like STaR.
  • It keeps improving over multiple training rounds, while standard self-improvement often plateaus or declines.
    • For example, on the tough MATH dataset, one model (Phi-3) improved from about 58.8% to 66.0% accuracy over several rounds; another (Mistral) improved from about 22.5% to 28.2%.
  • The models stayed diverse in how they reasoned. That diversity helped avoid “model collapse” (when answers all start looking the same and progress stalls).
  • They also showed zero-shot generalization: a team trained on one math dataset (MATH) did very well on a different one (GSM) it hadn’t seen, even beating baselines trained directly on GSM.

Why is this important?

  • It shows a way for AI to keep getting better using its own generated data, without always needing expensive human labels or access to the most powerful (and costly) models.
  • The teamwork setup (writers + editors + debate) helps the system not only get answers right, but also learn multiple ways of thinking—useful for complicated problems.
  • It works across different base models, including open-source ones, meaning it could help many AI systems become more capable.

What are the limits and future directions?

  • Cost: Training and running several models at once is more expensive and slower than using just one. The authors suggest future tricks like sharing weights, distillation (compressing the team’s knowledge into one model), or quantization (making models smaller/faster).
  • Integration: This teamwork idea could be combined with other alignment methods (like RLHF or DPO) for even better results.

Bottom line

Treating AI models like a study team—with writers who propose solutions and critics who improve them—lets the system learn diverse, smarter reasoning paths and keep improving for many rounds. This teamwork makes the AI better at challenging tasks, especially math, and helps it generalize to new problems without extra help.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 11 tweets with 186 likes about this paper.