Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 65 tok/s

Gemini 2.5 Pro 40 tok/s Pro

GPT-5 Medium 26 tok/s Pro

GPT-5 High 24 tok/s Pro

GPT-4o 113 tok/s Pro

Kimi K2 200 tok/s Pro

GPT OSS 120B 445 tok/s Pro

Claude Sonnet 4.5 34 tok/s Pro

2000 character limit reached

AgentDropout: Dynamic Agent Elimination for Token-Efficient and High-Performance LLM-Based Multi-Agent Collaboration (2503.18891v1)

Published 24 Mar 2025 in cs.CL and cs.AI

Abstract: Multi-agent systems (MAS) based on LLMs have demonstrated significant potential in collaborative problem-solving. However, they still face substantial challenges of low communication efficiency and suboptimal task performance, making the careful design of the agents' communication topologies particularly important. Inspired by the management theory that roles in an efficient team are often dynamically adjusted, we propose AgentDropout, which identifies redundant agents and communication across different communication rounds by optimizing the adjacency matrices of the communication graphs and eliminates them to enhance both token efficiency and task performance. Compared to state-of-the-art methods, AgentDropout achieves an average reduction of 21.6% in prompt token consumption and 18.4% in completion token consumption, along with a performance improvement of 1.14 on the tasks. Furthermore, the extended experiments demonstrate that AgentDropout achieves notable domain transferability and structure robustness, revealing its reliability and effectiveness. We release our code at https://github.com/wangzx1219/AgentDropout.

Summary

The paper introduces AgentDropout, which dynamically optimizes multi-agent LLM collaboration by eliminating redundant nodes and edges.
It employs a two-stage strategy using trainable adjacency matrices and policy gradients to balance task performance with token efficiency.
Experiments on reasoning, math, and code generation tasks show notable gains, achieving up to 21.6% reduction in token usage and improved performance.

LLM-based Multi-Agent Systems (MAS) frequently encounter challenges related to communication overhead and suboptimal task performance. Redundancy in communication, both in terms of unnecessary information exchange (edges) and the participation of non-critical agents (nodes) at certain stages, contributes to these issues. The paper introduces AgentDropout, a methodology designed to dynamically optimize the communication topology of MAS by identifying and eliminating redundant agents and communication links across different rounds (2503.18891). This approach draws inspiration from management theory, where team roles are often adjusted dynamically for efficiency.

Methodology: AgentDropout

AgentDropout employs a two-stage optimization process to learn a sparse, effective communication graph represented by weighted adjacency matrices. The goal is to maximize task performance while minimizing token consumption through the elimination of less relevant agents and connections.

Node Dropout

The initial phase focuses on identifying and removing agents (nodes) whose contributions are minimal within specific communication rounds.

Graph Representation: The communication structure is modeled as a weighted graph with trainable intra-round ($\tilde{\mathcal{A}_{\text{intra}}$) and inter-round ($\tilde{\mathcal{A}_{\text{inter}}$) adjacency matrices. Initial weights are typically set uniformly (e.g., 0.5).
Optimization for Performance: The intra-round matrices $\tilde{\mathcal{A}_{\text{intra}}$ are trained to maximize the expected task performance $\mu(\mathcal{G})$ , where $\mathcal{G}$ is the communication graph sampled based on the matrices. Since performance metrics (e.g., accuracy on benchmarks) are often non-differentiable with respect to the graph structure, an unbiased policy gradient estimator is utilized for optimization:

$\nabla_{\theta} J(\theta) = \mathbb{E}_{G \sim P_{\theta}(G)}[\nabla_{\theta} \log P_{\theta}(G) (\mu(G) - b)]$

Here, $\theta$ represents the parameters of the adjacency matrices, $P_{\theta}(G)$ is the probability of sampling graph $G$ , $\mu(G)$ is the performance on graph $G$ , and $b$ is a baseline to reduce variance.
Node Identification: After training $\tilde{\mathcal{A}_{\text{intra}}$, the weighted in-degree $d_{\text{in}}(v, t)$ and out-degree $d_{\text{out}}(v, t)$ are calculated for each node $v$ in each round $t$ . The total degree $d(v, t) = d_{\text{in}}(v, t) + d_{\text{out}}(v, t)$ serves as an indicator of the node's importance in that round.
Node Elimination: Nodes are ranked based on their total degree $d(v, t)$ within each round. A fixed proportion $\alpha$ of nodes with the lowest degrees are designated as dropout nodes for their respective rounds. These nodes, along with their incident edges, are removed, resulting in updated adjacency matrices. The selection criterion is:

$\text{DropNode}(v, t) \iff \text{rank}(d(v, t)) \le \alpha \times N$

where $N$ is the total number of agents.

Edge Dropout

Following Node Dropout, the second phase targets the removal of redundant communication links (edges).

Re-initialization and Training: The adjacency matrices ($\tilde{\mathcal{A}_{\text{intra}}$ and $\tilde{\mathcal{A}_{\text{inter}}$), potentially modified by Node Dropout, are re-initialized and trained again from scratch.
Optimization for Performance and Sparsity: The optimization objective now incorporates both task performance $\mu(\mathcal{G})$ and communication efficiency. Efficiency is promoted by adding a low-rank sparsity regularization term to the objective function. The objective becomes:

$\max_{\theta} \mathbb{E}_{G \sim P_{\theta}(G)}[\mu(G)] - \lambda \sum_{A \in \{\tilde{\mathcal{A}_{\text{intra}}, \tilde{\mathcal{A}_{\text{inter}}}\}} \text{rank}(A)$

where $\lambda$ is a hyperparameter balancing performance and sparsity. The rank function, being NP-hard to optimize directly, is approximated using the nuclear norm $||A||_*$ , which serves as a convex relaxation:

$\max_{\theta} \mathbb{E}_{G \sim P_{\theta}(G)}[\mu(G)] - \lambda \sum_{A \in \{\tilde{\mathcal{A}_{\text{intra}}, \tilde{\mathcal{A}_{\text{inter}}}\}} ||A||_*$

The performance term $\mathbb{E}_{G \sim P_{\theta}(G)}[\mu(G)]$ is optimized using policy gradients as before.
Edge Identification and Elimination: After training, edges corresponding to the lowest weights in the optimized matrices $\tilde{\mathcal{A}_{\text{intra}}$ and $\tilde{\mathcal{A}_{\text{inter}}$ are pruned. A proportion $\beta$ of edges with the smallest weights are removed. The criterion for edge $(u, v)$ at round $t$ represented by weight $w_{uv,t}$ is:

$\text{DropEdge}(u, v, t) \iff \text{rank}(w_{uv,t}) \le \beta \times M$

where $M$ is the total number of potential edges.
Final Graph Sampling: The resulting doubly-pruned weighted adjacency matrices define the probability distribution for sampling the final communication graph $\hat{\mathcal{G}}$ during inference using the DAGSample algorithm. This algorithm ensures the sampled graph is a Directed Acyclic Graph (DAG), preventing cyclical dependencies.

Experimental Results

AgentDropout was evaluated on reasoning (MMLU), mathematics (GSM8K, AQuA, MultiArith, SVAMP), and code generation (HumanEval) tasks using Llama3-8B, Qwen2.5-72B, and Deepseek-V3-671B as base LLMs.

Performance and Efficiency

Task Performance: AgentDropout demonstrated consistent performance improvements over baselines including single LLM inference, Chain-of-Thought (CoT), standard multi-round MAS (MAS_T), and AgentPrune (SOTA edge pruning method). With Llama3-8B, AgentDropout achieved an average performance gain of 1.14 points across benchmarks compared to AgentPrune. The method also improved performance stability, particularly with the smaller Llama3-8B model.
Token Consumption: Significant reductions in both prompt and completion tokens were observed. Compared to AgentPrune, AgentDropout achieved average reductions of 21.6% in prompt tokens and 18.4% in completion tokens. Specific figures for Llama3-8B (averaged across tasks) show AgentDropout using 3.3M prompt tokens and 839K completion tokens, compared to AgentPrune's 4.2M and 1.0M, respectively. Similar reductions were observed for larger models.

Robustness and Transferability

Structure Robustness: The effectiveness of AgentDropout was shown to be robust to variations in the initial communication graph structure (e.g., fully connected, layered, random). Optimized topologies derived from different initial structures yielded comparable performance and efficiency gains.
Domain Transferability: The communication topology learned by AgentDropout on one dataset (e.g., AQuA) exhibited strong transfer performance when applied to other datasets within the same domain (e.g., GSM8K, MultiArith, SVAMP). This suggests the learned pruning strategies capture generalizable collaborative patterns relevant to the task type (mathematical reasoning), reducing the need for extensive tuning on every new dataset.
Ablation Studies: Ablations confirmed the necessity of both Node Dropout and Edge Dropout stages. Applying only one stage resulted in inferior performance or efficiency compared to the full AgentDropout method. Furthermore, the learned dropout strategy significantly outperformed random node/edge dropout, validating the effectiveness of the optimization process.

In conclusion, AgentDropout presents a novel approach for optimizing LLM-based MAS by dynamically eliminating both redundant agents and communication links based on learned contributions across different stages of problem-solving. The method yields substantial improvements in token efficiency and notable gains in task performance, demonstrating robustness and transferability across tasks and initial structures. This technique offers a practical way to enhance the feasibility and effectiveness of multi-agent collaboration using LLMs.