BAMAS: Structuring Budget-Aware Multi-Agent Systems

Published 26 Nov 2025 in cs.MA and cs.AI | (2511.21572v1)

Abstract: LLM-based multi-agent systems have emerged as a powerful paradigm for enabling autonomous agents to solve complex tasks. As these systems scale in complexity, cost becomes an important consideration for practical deployment. However, existing work rarely addresses how to structure multi-agent systems under explicit budget constraints. In this paper, we propose BAMAS, a novel approach for building multi-agent systems with budget awareness. BAMAS first selects an optimal set of LLMs by formulating and solving an Integer Linear Programming problem that balances performance and cost. It then determines how these LLMs should collaborate by leveraging a reinforcement learning-based method to select the interaction topology. Finally, the system is instantiated and executed based on the selected agents and their collaboration topology. We evaluate BAMAS on three representative tasks and compare it with state-of-the-art agent construction methods. Results show that BAMAS achieves comparable performance while reducing cost by up to 86%.

Abstract PDF Upgrade to Chat

Summary

The paper presents a two-stage approach that combines ILP for cost-optimal LLM provisioning with RL for selecting effective collaboration topologies.
It demonstrates up to 86% reduction in operational costs while achieving comparable or superior accuracy relative to leading MAS frameworks.
The adaptive topology selection dynamically adjusts to task requirements and budget constraints, highlighting risk aversion in resource-limited scenarios.

Structured Budget-Aware Multi-Agent Systems with BAMAS

Introduction

BAMAS ("Structuring Budget-Aware Multi-Agent Systems") (2511.21572) addresses a critical gap in current LLM-based multi-agent system (MAS) design: adhering to explicit budget constraints while maximizing performance. Existing MAS frameworks focus on performance but largely ignore operational costs, primarily incurred via token consumption in LLM queries. BAMAS introduces a two-stage approach combining Integer Linear Programming (ILP) for optimal LLM provisioning and offline reinforcement learning (RL) for collaboration topology selection. The resulting system adapts both its agent pool and workflow structure to explicit budget limits, yielding strong performance at a fraction of conventional costs.

Figure 1: The BAMAS pipeline provisions a cost-optimal LLM set and selects a collaboration topology with RL to guide cost-efficient multi-agent execution.

Problem Formulation and Methodology

Given a task $T$ , an available LLM asset set $\mathcal{A}$ , and a cost budget $B$ , the objective is to instantiate a MAS maximizing performance without exceeding $B$ . BAMAS decomposes this into three stages:

Budget-Constrained LLM Provisioning: Uses an ILP to select a pool $\mathcal{P} \subseteq \mathcal{A}$ , maximizing cumulative performance weights while strictly meeting the budget constraint. Weights are recursively defined to ensure higher-tier LLMs (by Chatbot Arena ranking) are always preferred where affordable, yielding lexicographically optimal collections in terms of expected accuracy per unit cost.
Collaboration Topology Selection: Employs an offline RL policy $\pi_\theta$ over a discrete set of workflow topologies (Linear, Star, Feedback, Planner-driven). The policy is conditioned on the embedded task specification and budget and is optimized with a reward structure that combines success and cost efficiency, penalizing budget overruns and rewarding cost savings post-success.
Agent Instantiation: Assigns provisioned LLMs to agent roles as determined by the chosen topology, favoring the highest-weight models for roles with the greatest impact (e.g., critic or planner).

Cost-Performance Trade-Offs and Empirical Results

BAMAS is comprehensively evaluated against state-of-the-art MAS frameworks: AutoGen, MetaGPT, and ChatDev, each operating under comparable LLM resources and controlled budget settings. Experiments span representative domains—math reasoning (GSM8K, MATH) and program synthesis (MBPP).

BAMAS achieves a tunable trade-off:

Performance: Comparable or superior accuracy to all baselines at matched or lower budgets.
Cost Savings: Achieves up to 86% reduction in average cost, e.g., attaining 82.6% accuracy on MBPP at 529.2 cost units versus 82.2%/3735.1 for MetaGPT.
Budget Adherence: Strong compliance, with out-of-budget (OOB) executions rarely exceeding 3% even under severe constraints.
Figure 2: BAMAS delivers a consistent cost-accuracy Pareto frontier across all datasets, dominating baseline approaches.

Unlike existing systems, BAMAS can be explicitly tuned to arbitrary budget levels, enabling cost-performance adaptation for deployment environments with highly variable constraints.

Analysis of BAMAS Components

Ablation studies against a greedy Naive-CostAware baseline (which incrementally adds more LLMs without consideration for topology or global optimization) demonstrate that BAMAS's joint ILP provisioning and RL-based topology selection is strictly superior for any fixed cost. The system's efficacy is not attributable to a single component but to the synergy of globally optimal LLM selection and context-sensitive workflow adaptation.

Adaptive Topology Selection

BAMAS does not default to a static collaboration pattern, instead learning to vary topology by task and budget:

For mathematical reasoning (GSM8K, MATH): Feedback loops (generate-critique-revise) are preferred, corresponding to 40–70% of policy selections, especially on advanced tasks where iterative refinement is beneficial.
For code synthesis (MBPP): Linear topologies dominate, matching the demands of sequential program construction.
Under severe budget constraints, BAMAS exhibits risk aversion, eschewing complex (and costly) topologies like Feedback and Planner-driven in favor of Linear or Star, minimizing the risk of overrun.
Planner-driven topologies are universally avoided, as RL discovers their high cost and variably low returns rarely justify selection.
Figure 3: The distribution of selected topologies varies by dataset and budget, with more complex workflows favored as resources increase.

Practical and Theoretical Implications

BAMAS operationalizes the trade-off between model quality, system architecture, and cost with a scalable, modular framework:

For deployment, practitioners can explicitly guarantee resource ceilings while retaining near-optimal task performance.
The modular use of ILP and offline RL facilitates integration with future advances in LLM cost-modeling or workflow libraries.
Empirical evidence demonstrates that global, budget-aware planning is strictly superior to greedy resource-allocation and fixed-pattern multi-agent methods.

Theoretically, BAMAS suggests that robust MAS construction under cost constraints is tractably approximated by classical combinatorial optimization (ILP) coupled with data-driven topology policy learning; the approach generalizes to richer topological pattern libraries and non-binary LLM resource pools.

Conclusion

BAMAS establishes a principled framework for MAS design under budget constraints, leveraging exact ILP for agent selection and RL for flexible workflow structuring. It consistently matches or exceeds the accuracy of state-of-the-art methods while substantially reducing cost, and its adaptive topology selection policy demonstrates nuanced awareness of both domain and budget. The method sets a foundation for cost-aware, scalable multi-agent LLM deployment and highlights promising directions for future work in MAS budgeted reasoning, dynamic agent composition, and multi-objective optimization.

Markdown