Papers
Topics
Authors
Recent
Search
2000 character limit reached

Adaptive In-conversation Team Building for Language Model Agents

Published 29 May 2024 in cs.CL | (2405.19425v3)

Abstract: Leveraging multiple LLM agents has shown to be a promising approach for tackling complex tasks, while the effective design of multiple agents for a particular application remains an art. It is thus intriguing to answer a critical question: Given a task, how can we build a team of LLM agents to solve it effectively? Our new adaptive team-building paradigm offers a flexible solution, realized through a novel agent design named Captain Agent. It dynamically forms and manages teams for each step of a task-solving process, utilizing nested group conversations and reflection to ensure diverse expertise and prevent stereotypical outputs, allowing for a flexible yet structured approach to problem-solving. A comprehensive evaluation across six real-world scenarios demonstrates that Captain Agent significantly outperforms existing multi-agent methods with 21.94% improvement in average accuracy, providing outstanding performance without requiring task-specific prompt engineering. Our exploration of different backbone LLM and cost analysis further shows that Captain Agent can improve the conversation quality of weak LLM and achieve competitive performance with extremely low cost, which illuminates the application of multi-agent systems.

Citations (4)

Summary

  • The paper introduces an adaptive team-building paradigm using a Captain Agent to orchestrate dynamic multi-agent collaboration.
  • It leverages retrieval-augmented generation and nested group conversations to assemble specialized teams tailored to evolving task requirements.
  • Empirical evaluations reveal a 21.94% mean accuracy improvement over static teams, enhancing scalability, cost-effectiveness, and robustness.

Adaptive In-conversation Team Building for LLM Agents

Introduction

The paper "Adaptive In-conversation Team Building for LLM Agents" (2405.19425) addresses the challenge of constructing effective multi-agent systems based on LLMs for complex task-solving. The authors critique the prevailing static team-building paradigm, which predefines agent teams before task execution, and propose an adaptive approach that dynamically assembles and manages agent teams during the problem-solving process. The core contribution is the Captain Agent, an adaptive builder agent that orchestrates team formation, nested group conversations, and reflection, enabling flexible, context-sensitive collaboration among LLM agents.

Adaptive Team-Building Paradigm

Motivation and Limitations of Static Teams

Static team-building, where all agents are selected prior to task execution, suffers from scalability and adaptability issues. As task complexity increases, static teams require a large number of agents to cover all possible expertise, leading to context length limitations, management overhead, and reduced conversational quality due to irrelevant or redundant agent participation. Static teams also lack the flexibility to respond to evolving task requirements or unforeseen challenges during execution.

Captain Agent: Architecture and Workflow

The Captain Agent implements an adaptive team-building paradigm with two principal components:

  1. Adaptive Multi-agent Team Building: For each subtask, Captain Agent identifies required roles, retrieves or generates suitable agents and tools, and assembles a specialized team. This process leverages retrieval-augmented generation (RAG) using sentence embeddings (e.g., all-mpnet-base-v2) for semantic matching between role descriptions and agent/tool profiles. If no suitable agent is found, a new agent is generated with a tailored system message, incorporating both general and task-specific instructions.
  2. Nested Group Conversation and Reflection: The assembled team engages in a group chat, managed by the AutoGen framework, to collaboratively solve the subtask. Tool usage is integrated via free-form code execution, with results fed back into the conversation. A reflector LLM reviews the conversation, flags contradictions or issues, and provides a reflection report. If inconsistencies are detected, Captain Agent initiates a verification process with a new or modified team.

This cyclical process—plan, build team, solve subtask, reflect, and adapt—continues until the overall task is completed.

Implementation Details

Agent and Tool Libraries

  • Agent Library: Populated by running Captain Agent on a subset of problems, storing generated agents with detailed profiles. The library also includes hand-crafted agents from frameworks like AutoGen.
  • Tool Library: Comprises callable Python functions for math, data analysis, and information retrieval, designed to match dataset patterns and enhance agent capabilities.

Retrieval and Selection

  • Retrieval: For each role, top-kk agents and tools are retrieved from the libraries based on cosine similarity of sentence embeddings.
  • Selection: An LLM-based agent selector matches roles to agents, with an abstention mechanism to avoid forced, irrelevant assignments.
  • Generation: For unmatched roles, new agents are generated with system messages combining role-specific, general, and group chat instructions.

Nested Conversation

  • Group Chat Management: AutoGen manages turn-taking and context, with agents executing code and tool calls in a shared environment.
  • Reflection: A reflector LLM summarizes the conversation, identifies contradictions, and determines if further verification is needed.

Cost and Model Diversity

The approach incurs higher computational cost than single-agent systems due to increased context and agent participation. However, adaptive team-building reduces unnecessary agent involvement compared to static teams. The system supports both proprietary (e.g., GPT-4) and open-weight (e.g., LLaMA-3-70B) LLMs as agent backbones, enabling cost-performance trade-offs.

Empirical Evaluation

Benchmarks and Scenarios

Captain Agent is evaluated on six real-world scenarios: mathematics (MATH), programming (HumanEval), data analysis (DABench), scientific problem-solving (SciBench: chemistry and physics), and world information retrieval (GAIA). Each scenario is paired with a challenging open-source dataset.

Baselines

Comparisons include:

  • Vanilla LLM (single prompt)
  • AutoAgents (static multi-agent)
  • Meta-prompting (meta-model task decomposition)
  • AutoGen Assistant + Executor (two-agent system)
  • Scenario-specific baselines for GAIA

All methods use the same task-specific prompts and backbone LLMs for fairness.

Results

  • Captain Agent achieves a mean accuracy improvement of 21.94% over baselines across all scenarios.
  • In world information retrieval (GAIA), Captain Agent outperforms all leaderboard baselines with minimal prompt engineering.
  • Ablation studies show that adaptive team-building consistently outperforms static team-building, especially in scenarios requiring dynamic expertise composition.
  • Both agent and tool libraries are critical for optimal performance; removing either significantly degrades results, particularly on complex, multi-step tasks.
  • Open-weight models (e.g., LLaMA-3-70B) as agent backbones can approach or surpass the performance of some proprietary models at a fraction of the cost, though task preference and model selection remain important.

Analysis and Implications

Theoretical Implications

The adaptive team-building paradigm operationalizes principles from human organizational behavior—dynamic team assembly, role specialization, and iterative reflection—within LLM-based agent systems. This approach addresses the context length and specialization limitations of static teams, enabling more scalable and robust multi-agent collaboration.

Practical Implications

  • Generalization: Captain Agent requires only basic task instructions, avoiding heavy prompt engineering and manual agent design.
  • Scalability: Adaptive team-building reduces context bloat and irrelevant agent participation, improving efficiency and conversational quality.
  • Cost-Effectiveness: The ability to leverage open-weight models and minimize unnecessary agent involvement enables practical deployment in resource-constrained settings.
  • Robustness: The reflection mechanism and verification process mitigate hallucinations, factual errors, and stereotypical outputs.

Limitations and Future Directions

  • Cost: Multi-agent conversations with large models remain expensive; further work on conversation pruning and context compression is warranted.
  • Model Diversity: Task preference among LLMs affects nested chat quality; systematic evaluation and selection of agent backbones is needed.
  • Evaluation: Data leakage and benchmark limitations complicate fair assessment of agent capabilities; more rigorous evaluation protocols are necessary.

Conclusion

The adaptive in-conversation team-building paradigm, instantiated by Captain Agent, demonstrates significant improvements in multi-agent LLM task-solving across diverse domains. By dynamically assembling specialized teams, integrating tool use, and employing iterative reflection, the approach overcomes key limitations of static team-building. The results highlight the importance of adaptability, modularity, and reflection in the design of LLM-based agent systems. Future research should address cost reduction, model diversity, and evaluation rigor to further advance the practical deployment of adaptive multi-agent frameworks.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 4 tweets with 26 likes about this paper.