Papers
Topics
Authors
Recent
2000 character limit reached

Dynamic Auction-based Language Agent (DALA)

Updated 24 November 2025
  • The paper introduces an auction-based paradigm using VCG mechanisms to allocate token-based communication among LLM agents efficiently.
  • It employs MAPPO for agent training, which guides strategic message generation and bidding to balance cost and performance.
  • Empirical evaluations demonstrate state-of-the-art reasoning and code-generation performance while significantly reducing token usage.

The Dynamic Auction-based Language Agent (DALA) is a methodological paradigm for multi-agent communication and decision-making that operationalizes auctions as a foundation for message allocation among agents powered by LLMs. DALA formalizes the principle that inter-agent communication bandwidth is a scarce, valuable, and tradable resource, and applies auction-theoretic mechanisms to allocate communication rights efficiently, foster strategic silence, and optimize message informativity per cost. DALA has achieved state-of-the-art performance on demanding reasoning and code-generation benchmarks while sharply reducing operational token expenses (Fan et al., 17 Nov 2025). The concept extends to dynamic auction environments where agents must plan and bid adaptively under budget constraints (Chen et al., 2023), and incorporates foundational lessons from large-scale online auction bidding with deep RL agents (Wang et al., 2017).

1. Formal Problem Statement and Communication Model

In core DALA, a population of NN LLM-based agents {ai}i=1N\{a_i\}_{i=1}^N engages in collaborative problem-solving via the exchange of natural language messages over TT rounds. Crucially, all communication is measured in tokens, subject to both episode-level (BepisodeB_\text{episode}) and round-level (B(t)B(t)) budget constraints:

E[t=1TiWtL(mi(t))]Bepisode\mathbb{E}\left[\sum_{t=1}^T \sum_{i\in \mathcal{W}_t} L\big(m_i(t)\big)\right] \leq B_\text{episode}

where Wt\mathcal{W}_t is the selected set of "speaking" agents in round tt, and L(m)L(m) gives the token length of message mm.

The central objective is to learn agent policies θ={θi}i=1N\theta=\{\theta_i\}_{i=1}^N that maximize expected team task return, subject to the communication resource constraint:

maxθ  Eπθ[R(T)]s.t.Eπθ[t=1TiWtL(mi(t))]Bepisode\max_{\theta}\; \mathbb{E}_{\pi_\theta}[R(T)] \quad \text{s.t.} \quad \mathbb{E}_{\pi_\theta}\left[\sum_{t=1}^T \sum_{i\in \mathcal{W}_t} L(m_i(t))\right] \leq B_\text{episode}

This contrasts sharply with prior unconstrained communication protocols, which lead to combinatorial message explosion, token inefficiency, and low signal-to-noise ratios (Fan et al., 17 Nov 2025).

2. Auction Mechanism: VCG Model and Bid Computation

DALA employs a centralized combinatorial Vickrey-Clarke-Groves (VCG) auction at each round. Each agent ii submits a candidate message mi(t)m_i(t) with an accompanying bid bi(t)b_i(t). The winner determination problem (WDP) is to select the subset Wt\mathcal{W}_t of valid (budget-feasible) messages maximizing total bid:

$\mathcal{W}_t = \arg\max_{\mathcal{W}\subseteq B_\text{valid}} \sum_{(i, m, b)\in \mathcal{W}} b \quad \text{s.t.} \quad \sum_{(i,m,b)\in \mathcal{W}} L(m) \leq B^\max$

Payments are set according to the VCG rule, so that each winner ii pays

pi(t)=maxWBvalid{i}jWbjjWt{i}bjp_i(t) = \max_{\mathcal{W}^\prime \subseteq B_\text{valid}\setminus\{i\}} \sum_{j\in \mathcal{W}^\prime} b_j - \sum_{j\in \mathcal{W}_t\setminus\{i\}} b_j

This structure ensures truthful bidding under quasi-linear utility.

Agent bids are determined by a message-conditioned value function Vi(m,o)V_i(m, o), predicting the agent's marginal value to the team for emitting mm under local observation oo:

Vi(m,o)=gϕi(Encm(m)+Enco(o))V_i(m, o) = g_{\phi_i}\big(\mathrm{Enc}_m(m) + \mathrm{Enc}_o(o)\big)

Bids are normalized via Z-scoring and divided by token count to yield per-token value density:

V^i(m,o)=Vi(m,o)μtσt+εpi(m,o)=V^i(m,o)L(m)bi(t)=max{0,pi(mi(t),oi(t))}\hat{V}_i(m, o) = \frac{V_i(m, o) - \mu^t}{\sigma^t + \varepsilon} \quad p_i(m, o) = \frac{\hat{V}_i(m, o)}{L(m)} \quad b_i(t) = \max\{0, p_i(m_i(t), o_i(t))\}

Tiered communication modes (full message, summary, keywords, silence) are chosen based on pip_i crossing preset thresholds (Fan et al., 17 Nov 2025).

3. Agent Training, Bidding Strategies, and Emergent Behaviors

DALA agents are optimized using Multi-Agent Proximal Policy Optimization (MAPPO). Each agent's actor network proposes (mi,bi)(m_i, b_i) and its critic estimates ViV_i. The system-level reward for each agent combines the marginal task gain ΔRtask\Delta R_\text{task} and a communication cost penalty proportional to its VCG payment:

ri(t)=αΔRtask(t)βpi(t)1{iWt}r_i(t) = \alpha\,\Delta R_\text{task}(t) - \beta\,p_i(t)\,\mathbf{1}_{\{i\in\mathcal{W}_t\}}

The overall loss aggregates policy loss, value loss, and an entropy regularization bonus.

Resource rationality and strategic silence emerge: agents internalize the cost of verbosity and learn to remain silent (or send minimal summaries/keywords) for low-utility communications. Dynamical adaptation is observed depending on the available budget per round: with tighter constraints, the proportion of full messages collapses and strategic silence rises (Fan et al., 17 Nov 2025).

4. System Architecture and Implementation

A typical DALA system comprises:

  • Agent modules: Text encoders, LLM-based actors (e.g., GPT-4) outputting proposed messages and token-importance policies, and critic networks estimating value for (m,o)(m, o) pairs.
  • Centralized Auctioneer: Validates message/bid tuples against round budget, solves combinatorial WDP (typically via dynamic programming), computes VCG payments, and updates communication budgets.
  • Communication Protocol: Agents receive global state, submit (message, bid), and the auctioneer awards speech rights; winners broadcast content, after which observations are updated for the next round.

This architecture enables fine-grained control over message length, format, and content type under dynamic, context-sensitive budget limitations.

5. Applications and Empirical Evaluation

DALA has been applied to seven advanced benchmarks, including general knowledge reasoning (MMLU), math problems (GSM8K, MultiArith, SVAMP, AQUA, MATH-500), and code generation (HumanEval).

Task DALA Accuracy/Pass@1 Token Usage Prev. SOTA (accuracy/tokens)
MMLU 84.32% 1.81×10⁵ PHP: 83.45% / 2.60×10⁶
GSM8K 96.18% 6.25×10⁶ DyLAN: 95.83% / 1.40×10⁷
HumanEval 91.21% pass@1 <10⁶ Various; typically lower accuracy

DALA consistently occupies the high-accuracy, low-cost Pareto frontier on these tasks (Fan et al., 17 Nov 2025). Experiments demonstrate that the combination of value-function learning, value-density bidding, tiered message types, and dynamic budgets is essential for optimal performance and cost control.

6. Extensions: Strategic Planning in Dynamic Auction Environments

DALA generalizes to rich, dynamic auction settings, such as the AucArena environment, where LLM agents must optimize for resource allocation, risk, and goal adherence across multiple, sequential, open-outcry auctions (Chen et al., 2023). Agents implement a Belief–Desire–Intention (BDI) loop:

  • Planning: Prioritize items via learned/reasoned rankings.
  • Bid Generation: Conditioned on current state, plan, and beliefs, generate incremental bids or withdraw.
  • Budget and Risk Management: Monitor dynamic consumption, aggressiveness, and opponent behaviors.
  • Belief Update and Replanning: Use structured updates and prompt engineering to adjust future strategies.

Evaluation frameworks leverage TrueSkill μ-score, Corrected Failure Rate, and plan-execution correlation, confirming LLM-based DALA agents' proficiency at strategic adaptation and budget adherence in unpredictable, multi-round auctions.

7. Limitations and Future Directions

  • Scalability: MAPPO with VCG auctions is resource-intensive for large NN; sample-efficient MARL or approximation strategies are an area for further research.
  • Centralization: Current implementations rely on a centralized auctioneer; decentralized and peer-to-peer market protocols are under-explored.
  • Agent Homogeneity: All agents are currently assumed identical. Heterogeneous teams with variable LLMs, skills, or cost structures introduce new challenges in equilibrium and fairness.
  • Extensibility: Extension to other scarce resources (oracle queries, multi-modal sensor bandwidth), combinatorial or sealed-bid auctions, and reinforcement learning fine-tuning are fruitful avenues (Fan et al., 17 Nov 2025, Chen et al., 2023).
  • Integration with High-Frequency Environments: DALA inherits lessons from dynamic online ad auctions (e.g., LADDER (Wang et al., 2017)), combining full asynchrony, domain-based augmentation, and efficient language representation.

DALA redefines multi-agent language interaction as an economic allocation problem, enforcing resource rationality through explicit market-like mechanisms that simultaneously drive strategic efficiency, informativity, and scalability in LLM-powered systems.

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Dynamic Auction-based Language Agent (DALA).