Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems (2502.11098v1)

Published 16 Feb 2025 in cs.AI, cs.LG, and cs.MA

Abstract: Recent advancements in LLM-based multi-agent (LLM-MA) systems have shown promise, yet significant challenges remain in managing communication and refinement when agents collaborate on complex tasks. In this paper, we propose \textit{Talk Structurally, Act Hierarchically (TalkHier)}, a novel framework that introduces a structured communication protocol for context-rich exchanges and a hierarchical refinement system to address issues such as incorrect outputs, falsehoods, and biases. \textit{TalkHier} surpasses various types of SoTA, including inference scaling model (OpenAI-o1), open-source multi-agent models (e.g., AgentVerse), and majority voting strategies on current LLM and single-agent baselines (e.g., ReAct, GPT4o), across diverse tasks, including open-domain question answering, domain-specific selective questioning, and practical advertisement text generation. These results highlight its potential to set a new standard for LLM-MA systems, paving the way for more effective, adaptable, and collaborative multi-agent frameworks. The code is available https://github.com/sony/talkhier.

Summary

  • The paper introduces TalkHier, a framework for LLM multi-agent systems that uses structured communication and hierarchical refinement to improve collaboration and performance on complex tasks.
  • Evaluated on benchmarks like MMLU and WikiQA, TalkHier achieved higher accuracy and better performance metrics compared to various single-agent, multi-agent, and proprietary baselines.
  • Ablation studies demonstrated that both the structured communication protocol and the hierarchical refinement component are essential for TalkHier's superior performance, despite the framework's relatively high API cost.

The paper introduces Talk Structurally, Act Hierarchically (TalkHier), a novel collaborative LLM-MA (LLM-based Multi-Agent) framework designed to enhance communication and refinement among agents working on complex tasks. TalkHier integrates a structured communication protocol with a hierarchical refinement system, addressing the limitations of disorganized communication and ineffective refinement schemes in existing LLM-MA systems.

The key contributions of TalkHier are:

  • A well-structured, context-rich communication protocol that incorporates messages Mij(t)\mathbf{M}_{ij}^{(t)}, intermediate outputs Iij(t)\mathbf{I}_{ij}^{(t)}, and relevant background information Bij(t)\mathbf{B}_{ij}^{(t)} to ensure clarity and precision in agent communication. The communication between agents is represented by communication events cij(t)Cpc_{ij}^{(t)} \in \mathcal{C}_p, where each event cij(t)c_{ij}^{(t)} encapsulates the interaction from agent viv_i to agent vjv_j along an edge eijEe_{ij} \in \mathcal{E} at time step tt. Formally, a communication event cij(t)c_{ij}^{(t)} is defined as:

    $c_{ij}^{(t)} = ({ \mathbf{M}_{ij}^{(t)}, \mathbf{B}_{ij}^{(t)}, \mathbf{I}_{ij}^{(t)})$, where:

    • Mij(t)\mathbf{M}_{ij}^{(t)} is the message content sent from viv_i to vjv_j, containing instructions or clarifications.
    • Bij(t)\mathbf{B}_{ij}^{(t)} is background information to ensure coherence and task progression, including the problem's core details and intermediate decisions.
    • Iij(t)\mathbf{I}_{ij}^{(t)} refers to the intermediate output generated by viv_i, shared with vjv_j to support task progression and traceability, all at time step tt.
  • A hierarchical refinement framework that enhances multi-agent evaluation systems, enabling agents to act hierarchically. This approach addresses the difficulty in summarizing opinions or feedback as the number of agents increases and mitigates biases caused by the order of feedback processing.
  • Each agent viv_i maintains an independent memory, MemoryiMemory_i, to enhance efficiency and scalability. This agent-specific memory allows each agent to independently retain and reason on its past interactions and knowledge.

The methodology of TalkHier involves representing the LLM-MA system as a graph G=(V,E)\mathcal{G} = (\mathcal{V}, \mathcal{E}), where V\mathcal{V} denotes the set of agents (nodes) and E\mathcal{E} represents the set of communication pathways (edges). Each agent viVv_i \in \mathcal{V} is defined by its role RoleiRole_i, plugins PluginsiPlugins_i, memory MemoryiMemory_i, and type TypeiType_i, specifying whether the agent is a Supervisor (SS) or a Member (MM).

The hierarchical structure of agents is defined as: Vmain={vmainS,vmainGen,vevalS,vmainRev}\mathcal{V}_\text{main} = \{v_\text{main}^S, v_\text{main}^\text{Gen}, v_\text{eval}^S, v_\text{main}^\text{Rev}\},

Veval={vevalS,vevalE1,vevalE2,,vevalEk}\mathcal{V}_\text{eval} = \{v_\text{eval}^S, v_\text{eval}^{E_1}, v_\text{eval}^{E_2}, \ldots, v_\text{eval}^{E_k}\},

where the Main Supervisor (vmainSv_\text{main}^S) and Evaluation Supervisor (vevalSv_\text{eval}^S) oversee their respective team’s operations and assign tasks to each member. The Generator (vmainGenv_\text{main}^\text{Gen}) provides solutions for a given problem, and the Revisor (vmainRevv_\text{main}^\text{Rev}) refines outputs based on given feedback. The evaluation team is composed of kk independent evaluators vevalEkv_\text{eval}^{E_k}, each of which outputs evaluation results for a given problem based on their specified metric.

The hierarchical refinement process is detailed in Algorithm 1 of the paper, involving task assignment, distribution, evaluation, feedback aggregation, and revision. The process continues iteratively until a quality threshold Mthreshold\mathcal{M}_\text{threshold} is met or a maximum number of iterations TmaxT_\text{max} is reached.

The paper explores several research questions:

  • RQ1: Does TalkHier outperform existing multi-agent, single-agent, and proprietary approaches on general benchmarks?
  • RQ2: How does TalkHier perform on open-domain question-answering tasks?
  • RQ3: What is the contribution of each component of TalkHier to its overall performance?
  • RQ4: How well does TalkHier generalize to more practical but complex generation task?

TalkHier was evaluated on a diverse set of benchmarks: the Massive Multitask Language Understanding (MMLU) benchmark, WikiQA, and a camera dataset for advertisement text generation. The baselines included GPT-4o, OpenAI-o1-preview, ReAct, AutoGPT, AgentVerse, GPTSwarm, and AgentPrune.

The results on the MMLU dataset showed that TalkHier achieves the highest average accuracy (88.38\%) across five domains, outperforming open-source multi-agent models like AgentVerse (83.66\%) and majority voting strategies applied to LLM and single-agent baselines. On the WikiQA dataset, TalkHier outperformed baselines in both Rouge-1 (0.3461) and BERTScore (0.6079), demonstrating its ability to generate accurate and semantically relevant answers.

Ablation studies demonstrated the contribution of individual components in TalkHier. Removing the evaluation supervisor caused a significant drop in accuracy, underscoring the necessity of the hierarchical refinement approach. Ablation studies on the communication protocol revealed that removing intermediate outputs or background information leads to inferior performance. Evaluation on the camera dataset showed that TalkHier outperforms baselines in Faithfulness, Fluency, and Attractiveness. The mean performance gain over the best-performing baseline, OKG, across all metrics is approximately 17.63%.

The paper identifies a limitation of TalkHier as the relatively high API (Application Programming Interface) cost associated with the experiments. The structured interaction among multiple agents increases computational expenses, raising concerns about the accessibility of LLM research for researchers with limited resources.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com