ACC-Collab: An Actor-Critic Approach to Multi-Agent LLM Collaboration (2411.00053v3)

Published 30 Oct 2024 in cs.CL and cs.AI

Abstract: LLMs have demonstrated a remarkable ability to serve as general-purpose tools for various language-based tasks. Recent works have demonstrated that the efficacy of such models can be improved through iterative dialog between multiple models. While these paradigms show promise in improving model efficacy, most works in this area treat collaboration as an emergent behavior, rather than a learned behavior. In doing so, current multi-agent frameworks rely on collaborative behaviors to have been sufficiently trained into off-the-shelf models. To address this limitation, we propose ACC-Collab, an Actor-Critic based learning framework to produce a two-agent team (an actor-agent and a critic-agent) specialized in collaboration. We demonstrate that ACC-Collab outperforms SotA multi-agent techniques on a wide array of benchmarks.

Summary

The paper introduces ACC-Debate, an actor-critic framework that trains LLMs for debate-driven collaborative problem-solving.
It integrates a novel partial trajectory reward to optimize debate accuracy and convergence across diverse benchmarks.
Empirical results show significant performance gains on datasets like BoolQ, MMLU, and ARC, underscoring its robustness.

ACC-Collab: An Actor-Critic Approach to Multi-Agent LLM Collaboration

The paper introduces ACC-Debate, an Actor-Critic-based learning framework designed to enhance the collaborative capabilities of LLMs through a multi-agent debate approach. It presents a structured methodology for training LLMs in debate scenarios, focusing on collaborative problem-solving through an iterative, dialogue-based process.

Introduction

The ACC-Debate framework addresses the need to enhance LLMs' ability to reason and collaborate effectively by training them explicitly for these tasks, rather than relying on emergent behaviors. It builds upon existing multi-agent debate methodologies by integrating an actor-critic model that iteratively refines answers through structured debates. This approach facilitates the development of specialized collaborative skills in LLMs, as opposed to relying solely on their few-shot or zero-shot capabilities.

Methodology

Actor-Critic Debate Framework

The ACC-Debate employs a two-agent system comprising an actor and a critic. The actor proposes answers during the debate, while the critic provides feedback to guide the actor towards more accurate responses. This framework is optimized for maximizing the actor's accuracy at convergence, structured as a bi-level optimization problem.

Partial Trajectory Reward

A novel concept introduced is the "Partial Trajectory Reward," which evaluates the potential accuracy of a debate based on the dialogue state at any given point. This prediction helps in reinforcing beneficial pathways within the debate trajectory, thereby improving overall accuracy and convergence rates.

Off-Policy Trajectory Generation

The ACC-Debate framework uses an off-policy data generation strategy known as "guided-debate" to efficiently create high-quality training data. This process involves generating potential debate paths that either support or contest the current hypothesis, allowing for effective actor-critic training through preference optimization.

Figure 1: ACC-Debate training pipeline.

Experiments and Results

The paper reports that ACC-Debate outperforms state-of-the-art debate techniques across diverse benchmarks, including BoolQ, MMLU, BBH, SCIQ, and ARC. The experimental outcomes demonstrate that group-based debate among trained LLMs significantly enhances performance compared to traditional approaches.

Figure 2: Percent improvement in accuracy after five rounds of debate, compared to a single round. Percent improvement (Eq. \ref{eq:improve}).

The performance gains were consistent across varying datasets and debate scenarios, indicating the robustness of the ACC-Debate method.

Implications and Future Work

The ACC-Debate framework signifies progress in harnessing the potential of LLMs for collaborative tasks. Its structured approach to debate training opens avenues for enhancing collaboration in real-world applications, such as decision-making systems and complex problem-solving environments.

The implications extend to developing more sophisticated AI systems capable of nuanced reasoning and effective teamwork. Future research may explore scalability to larger models and adaptation to more complex task domains beyond question-answering.

Conclusion

The ACC-Debate framework exemplifies a significant advancement in multi-agent reinforcement learning, focusing on improving collaboration among LLMs. By embedding debate as a learned behavior rather than relying on it as an emergent property, ACC-Debate paves the way for more capable, collaborative artificial intelligence systems.