Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-agent Architecture Search via Agentic Supernet (2502.04180v2)

Published 6 Feb 2025 in cs.LG, cs.CL, and cs.MA

Abstract: LLM-empowered multi-agent systems extend the cognitive boundaries of individual agents through disciplined collaboration and interaction, while constructing these systems often requires labor-intensive manual designs. Despite the availability of methods to automate the design of agentic workflows, they typically seek to identify a static, complex, one-size-fits-all system, which, however, fails to dynamically allocate inference resources based on the difficulty and domain of each query. To address this challenge, we shift away from the pursuit of a monolithic agentic system, instead optimizing the \textbf{agentic supernet}, a probabilistic and continuous distribution of agentic architectures. We introduce MaAS, an automated framework that samples query-dependent agentic systems from the supernet, delivering high-quality solutions and tailored resource allocation (\textit{e.g.}, LLM calls, tool calls, token cost). Comprehensive evaluation across six benchmarks demonstrates that MaAS \textbf{(I)} requires only $6\sim45\%$ of the inference costs of existing handcrafted or automated multi-agent systems, \textbf{(II)} surpasses them by $0.54\%\sim11.82\%$, and \textbf{(III)} enjoys superior cross-dataset and cross-LLM-backbone transferability.

Summary

  • The paper proposes the agentic supernet framework, which allows dynamic, query-adaptive multi-agent architecture search over a continuous distribution of operator cascades.
  • Experimental results show the method outperforms baselines by up to 11.82% while reducing inference cost by up to 94% across various benchmarks like math, code, and tool use.
  • Key technical contributions include a cost-aware optimization objective, an early-exit mechanism for resource efficiency, and novel textual gradients for updating black-box components.

The paper proposes a novel framework for automating multi-agent system design by shifting from a static, one-size-fits-all architecture to a dynamic, probabilistic formulation called the agentic supernet. This framework is designed for systems empowered by LLM agents and extends their collective reasoning and tool usage while reducing the inference cost through query-dependent allocation of resources.

Overview and Key Contributions

The work formulates multi-agent system design as a search over a continuous distribution of architectures rather than finding a single optimal configuration. Key contributions include:

  • The introduction of the agentic supernet, which models a cascade of agentic operators (e.g., Chain-of-Thought (CoT), Multi-agent Debate, ReAct, among others) arranged in layers, where each operator is a composite module involving multiple LLM invocations and tool interactions.
  • A controller network that samples architectures from the supernet in a query-adaptive manner. In this scheme, the architecture is built layer by layer, and an early-exit operator is integrated to allow the system to halt resource usage when a query can be answered with minimal processing.
  • A cost-constrained optimization objective that balances the solution utility against resource expenditures, where token cost, API cost, and wall-clock time are explicitly considered. The objective takes the form minGQθE(q,a)D[p(aq,T,O)+λC(G;q)]\min_{G \sim Q_\theta} \mathbb{E}_{(q,a)\sim D} \Big[- \, p(a|q,T,O) + \lambda \, C(G; q)\Big] where p(aq,T,O)p(a|q,T,O) is the probability of obtaining the correct answer via the sampled architecture GG, C(G;q)C(G; q) is the cost function (with costs measured by token usage and external API calls), and λ\lambda is a trade-off parameter.
  • The use of an empirical Bayes Monte Carlo procedure to approximate gradients for the distribution parameters, and the novel utilization of agentic textual gradients that enable backpropagation-like updates to black-box components such as natural language prompts and tool call configurations.

Methodology and Technical Details

The framework defines an agentic operator as a structured unit containing LLM instances, prompts, and tool settings. Multiple operators are composited into a multi-agent system represented as a directed acyclic graph (DAG), where each node corresponds to an operator and edges reflect the flow of intermediate reasoning. The architecture sampling proceeds as follows:

  • A query-dependent controller (implemented in a Mixture-of-Expert style) scores available operators at each layer. The sampling process continues until a cumulative activation threshold is reached or an early-exit operator is sampled.
  • The controller’s conditional probability distribution over operators is updated jointly with the operator parameters using two distinct gradients: a Monte Carlo based gradient for distribution parameters and a textual gradient mechanism for operator modifications.
  • The incorporation of an early-exit mechanism is particularly effective in dynamically modulating the depth of the multi-agent system according to query complexity, thereby reducing unnecessary LLM and tool invocations.

Experimental Evaluation and Numerical Results

Extensive experimentation is conducted across benchmarks in math reasoning (GSM8K, MATH, MultiArith), code generation (HumanEval, MBPP), and tool use (GAIA). The results demonstrate that:

  • The multi-agent systems generated by the proposed method outperform both handcrafted and existing automated systems by 0.54% to 11.82% on different tasks.
  • In terms of inference economics, the proposed system uses as little as 6% to 45% of the inference cost (measured via token cost, API calls, and wall-clock time) compared to baseline methods. For instance, on the MATH benchmark, the framework required an API cost of only $0.42 per query while achieving competitive accuracy.
  • The agentic supernet displays a notable transferability both across datasets and LLM backbones. The cross-model improvement, as evidenced in experiments where the optimized supernet transferred from one LLM (gpt-4o-mini) to others (Qwen-2.5-72b, llama-3.1-70b), shows a performance lift of approximately 4.98% to 5.50%.
  • A comprehensive sensitivity analysis indicates that the depth of the supernet, the cost-penalty coefficient, and the number of sampling iterations (with K = 4 providing a low-variance estimate) are critical for achieving an optimal balance between performance and resource expenditure.

Ablation Studies and Analysis

The authors perform ablation studies to systematically remove components such as the textual gradient update (VOL), the early-exit operator, and the cost constraint. The findings reveal that removing the textual gradient—the component responsible for enabling self-evolution of black-box operator parameters—results in the largest drop in performance. Removing the early-exit mechanism increases resource consumption without substantially affecting accuracy, while the absence of explicit cost constraints leads to higher inference costs with marginal impact on performance.

Conclusion

In summary, the paper presents a carefully engineered framework that dynamically adapts multi-agent architectures in a cost-efficient manner by exploring a continuous distribution over agentic systems. The blend of probabilistic sampling, cost-aware optimization, and innovative textual gradient techniques results in a framework that is not only superior in performance across diverse benchmarks but also significantly more efficient in resource consumption. The methodological advancements presented here have substantial implications for the design and deployment of adaptive, self-organizing systems in complex LLM-driven applications.