Dynamic Auction-based Language Agent (DALA)

Updated 24 November 2025

The paper introduces an auction-based paradigm using VCG mechanisms to allocate token-based communication among LLM agents efficiently.
It employs MAPPO for agent training, which guides strategic message generation and bidding to balance cost and performance.
Empirical evaluations demonstrate state-of-the-art reasoning and code-generation performance while significantly reducing token usage.

The Dynamic Auction-based Language Agent (DALA) is a methodological paradigm for multi-agent communication and decision-making that operationalizes auctions as a foundation for message allocation among agents powered by LLMs. DALA formalizes the principle that inter-agent communication bandwidth is a scarce, valuable, and tradable resource, and applies auction-theoretic mechanisms to allocate communication rights efficiently, foster strategic silence, and optimize message informativity per cost. DALA has achieved state-of-the-art performance on demanding reasoning and code-generation benchmarks while sharply reducing operational token expenses (Fan et al., 17 Nov 2025). The concept extends to dynamic auction environments where agents must plan and bid adaptively under budget constraints (Chen et al., 2023), and incorporates foundational lessons from large-scale online auction bidding with deep RL agents (Wang et al., 2017).

1. Formal Problem Statement and Communication Model

In core DALA, a population of $N$ LLM-based agents $\{a_i\}_{i=1}^N$ engages in collaborative problem-solving via the exchange of natural language messages over $T$ rounds. Crucially, all communication is measured in tokens, subject to both episode-level ( $B_\text{episode}$ ) and round-level ( $B(t)$ ) budget constraints:

$\mathbb{E}\left[\sum_{t=1}^T \sum_{i\in \mathcal{W}_t} L\big(m_i(t)\big)\right] \leq B_\text{episode}$

where $\mathcal{W}_t$ is the selected set of "speaking" agents in round $t$ , and $L(m)$ gives the token length of message $m$ .

The central objective is to learn agent policies $\theta=\{\theta_i\}_{i=1}^N$ that maximize expected team task return, subject to the communication resource constraint:

$\max_{\theta}\; \mathbb{E}_{\pi_\theta}[R(T)] \quad \text{s.t.} \quad \mathbb{E}_{\pi_\theta}\left[\sum_{t=1}^T \sum_{i\in \mathcal{W}_t} L(m_i(t))\right] \leq B_\text{episode}$

This contrasts sharply with prior unconstrained communication protocols, which lead to combinatorial message explosion, token inefficiency, and low signal-to-noise ratios (Fan et al., 17 Nov 2025).

2. Auction Mechanism: VCG Model and Bid Computation

DALA employs a centralized combinatorial Vickrey-Clarke-Groves (VCG) auction at each round. Each agent $i$ submits a candidate message $m_i(t)$ with an accompanying bid $b_i(t)$ . The winner determination problem (WDP) is to select the subset $\mathcal{W}_t$ of valid (budget-feasible) messages maximizing total bid:

$\mathcal{W}_t = \arg\max_{\mathcal{W}\subseteq B_\text{valid}} \sum_{(i, m, b)\in \mathcal{W}} b \quad \text{s.t.} \quad \sum_{(i,m,b)\in \mathcal{W}} L(m) \leq B^\max$

Payments are set according to the VCG rule, so that each winner $i$ pays

$p_i(t) = \max_{\mathcal{W}^\prime \subseteq B_\text{valid}\setminus\{i\}} \sum_{j\in \mathcal{W}^\prime} b_j - \sum_{j\in \mathcal{W}_t\setminus\{i\}} b_j$

This structure ensures truthful bidding under quasi-linear utility.

Agent bids are determined by a message-conditioned value function $V_i(m, o)$ , predicting the agent's marginal value to the team for emitting $m$ under local observation $o$ :

$V_i(m, o) = g_{\phi_i}\big(\mathrm{Enc}_m(m) + \mathrm{Enc}_o(o)\big)$

Bids are normalized via Z-scoring and divided by token count to yield per-token value density:

$\hat{V}_i(m, o) = \frac{V_i(m, o) - \mu^t}{\sigma^t + \varepsilon} \quad p_i(m, o) = \frac{\hat{V}_i(m, o)}{L(m)} \quad b_i(t) = \max\{0, p_i(m_i(t), o_i(t))\}$

Tiered communication modes (full message, summary, keywords, silence) are chosen based on $p_i$ crossing preset thresholds (Fan et al., 17 Nov 2025).

3. Agent Training, Bidding Strategies, and Emergent Behaviors

DALA agents are optimized using Multi-Agent Proximal Policy Optimization (MAPPO). Each agent's actor network proposes $(m_i, b_i)$ and its critic estimates $V_i$ . The system-level reward for each agent combines the marginal task gain $\Delta R_\text{task}$ and a communication cost penalty proportional to its VCG payment:

$r_i(t) = \alpha\,\Delta R_\text{task}(t) - \beta\,p_i(t)\,\mathbf{1}_{\{i\in\mathcal{W}_t\}}$

The overall loss aggregates policy loss, value loss, and an entropy regularization bonus.

Resource rationality and strategic silence emerge: agents internalize the cost of verbosity and learn to remain silent (or send minimal summaries/keywords) for low-utility communications. Dynamical adaptation is observed depending on the available budget per round: with tighter constraints, the proportion of full messages collapses and strategic silence rises (Fan et al., 17 Nov 2025).

4. System Architecture and Implementation

A typical DALA system comprises:

Agent modules: Text encoders, LLM-based actors (e.g., GPT-4) outputting proposed messages and token-importance policies, and critic networks estimating value for $(m, o)$ pairs.
Centralized Auctioneer: Validates message/bid tuples against round budget, solves combinatorial WDP (typically via dynamic programming), computes VCG payments, and updates communication budgets.
Communication Protocol: Agents receive global state, submit (message, bid), and the auctioneer awards speech rights; winners broadcast content, after which observations are updated for the next round.

This architecture enables fine-grained control over message length, format, and content type under dynamic, context-sensitive budget limitations.

5. Applications and Empirical Evaluation

DALA has been applied to seven advanced benchmarks, including general knowledge reasoning (MMLU), math problems (GSM8K, MultiArith, SVAMP, AQUA, MATH-500), and code generation (HumanEval).

Task	DALA Accuracy/Pass@1	Token Usage	Prev. SOTA (accuracy/tokens)
MMLU	84.32%	1.81×10⁵	PHP: 83.45% / 2.60×10⁶
GSM8K	96.18%	6.25×10⁶	DyLAN: 95.83% / 1.40×10⁷
HumanEval	91.21% pass@1	<10⁶	Various; typically lower accuracy

DALA consistently occupies the high-accuracy, low-cost Pareto frontier on these tasks (Fan et al., 17 Nov 2025). Experiments demonstrate that the combination of value-function learning, value-density bidding, tiered message types, and dynamic budgets is essential for optimal performance and cost control.

6. Extensions: Strategic Planning in Dynamic Auction Environments

DALA generalizes to rich, dynamic auction settings, such as the AucArena environment, where LLM agents must optimize for resource allocation, risk, and goal adherence across multiple, sequential, open-outcry auctions (Chen et al., 2023). Agents implement a Belief–Desire–Intention (BDI) loop:

Planning: Prioritize items via learned/reasoned rankings.
Bid Generation: Conditioned on current state, plan, and beliefs, generate incremental bids or withdraw.
Budget and Risk Management: Monitor dynamic consumption, aggressiveness, and opponent behaviors.
Belief Update and Replanning: Use structured updates and prompt engineering to adjust future strategies.

Evaluation frameworks leverage TrueSkill μ-score, Corrected Failure Rate, and plan-execution correlation, confirming LLM-based DALA agents' proficiency at strategic adaptation and budget adherence in unpredictable, multi-round auctions.

7. Limitations and Future Directions

Scalability: MAPPO with VCG auctions is resource-intensive for large $N$ ; sample-efficient MARL or approximation strategies are an area for further research.
Centralization: Current implementations rely on a centralized auctioneer; decentralized and peer-to-peer market protocols are under-explored.
Agent Homogeneity: All agents are currently assumed identical. Heterogeneous teams with variable LLMs, skills, or cost structures introduce new challenges in equilibrium and fairness.
Extensibility: Extension to other scarce resources (oracle queries, multi-modal sensor bandwidth), combinatorial or sealed-bid auctions, and reinforcement learning fine-tuning are fruitful avenues (Fan et al., 17 Nov 2025, Chen et al., 2023).
Integration with High-Frequency Environments: DALA inherits lessons from dynamic online ad auctions (e.g., LADDER (Wang et al., 2017)), combining full asynchrony, domain-based augmentation, and efficient language representation.

DALA redefines multi-agent language interaction as an economic allocation problem, enforcing resource rationality through explicit market-like mechanisms that simultaneously drive strategic efficiency, informativity, and scalability in LLM-powered systems.