Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 190 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 39 tok/s Pro
GPT-5 High 46 tok/s Pro
GPT-4o 130 tok/s Pro
Kimi K2 202 tok/s Pro
GPT OSS 120B 439 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

AI Agents as Economic Actors

Updated 28 October 2025
  • AI Agents as Economic Actors are autonomous, goal-driven entities using reinforcement learning to simulate and optimize decisions in complex economic environments.
  • They employ multi-agent methodologies like PSRO versus IMARL to achieve near-Nash equilibria, ensuring robust macroeconomic outcomes and competitive strategies.
  • Despite higher computational requirements, these approaches provide actionable insights for policy design by incorporating agent heterogeneity and adaptive strategy optimization.

AI agents as economic actors constitute a rapidly emerging paradigm in computational economics, empirical game theory, and agent-based modeling. AI agents in this context are defined as autonomous, goal-driven entities—implemented via reinforcement learning (RL)—that participate as principal economic actors (e.g., households, firms, and policymakers) within formally specified economic systems. These agents interact within dynamic environments, optimize diverse objectives, and adaptively form strategies that shape the outcomes of macroeconomic systems.

1. Formal Structure of Economic Systems with AI Agents

AI agents are embedded within an agent-based simulator describing a multi-agent economic system as a Partially Observable Markov Game (POMG): $\Gamma = \langle \mathcal{N}, \mathcal{S}, \{\mathcal{A}_i\}, \{\mathcal{O}_i\}, \mathds{T}, \{\mathds{O}_i\}, \{R_i\}, \{\beta_i\}, H \rangle$ where:

  • N\mathcal{N}: set of agents (e.g., heterogeneous households, firms, central bank, government).
  • S\mathcal{S}: state space (e.g., inventories, macroeconomic variables).
  • Ai\mathcal{A}_i: actions available to agent ii (e.g., consumption choices, tax rates, wage setting).
  • Oi\mathcal{O}_i: agent observables (private and public signals).
  • $\mathds{T}$: state transition function.
  • $\mathds{O}_i$: observation functions.
  • RiR_i: reward (objective) for agent ii.
  • βi\beta_i: discount factor.
  • HH: episode horizon.

Agents repeatedly observe private/global states, select actions, receive rewards, and update policies. The design supports both agent heterogeneity and interaction across micro- and macroeconomic levels.

2. Agent Typologies, Objectives, and Reward Functions

The simulator implements four primary actor types:

  • Households: Consumer-workers seeking to maximize utility over consumption (c)(c), labor disutility (n)(n), and savings (m)(m). The instantaneous reward function is

u(c,n,m;γ,ν,μ)=c1γ1γνn2+μsign(m)m1γ1γu(c, n, m; \gamma, \nu, \mu) = \frac{c^{1-\gamma}}{1-\gamma} - \nu n^2 + \mu\,\mathrm{sign}(m)\frac{|m|^{1-\gamma}}{1-\gamma}

with isoelastic preference parameters γ,ν,μ\gamma, \nu, \mu controlling risk aversion, labor supply penalty, and savings preference.

  • Firms: Producers maximizing profits (sales revenue minus labor costs and inventory penalties):

rfirm=pt,jict,ijwt,jint,ijωijχjpt,jYt+1,jr_\text{firm} = p_{t, j} \sum_i c_{t, ij} - w_{t, j} \sum_i n_{t, ij}\omega_{ij} - \chi_j p_{t, j} Y_{t+1, j}

where pt,jp_{t,j}: price, wt,jw_{t,j}: wage, nt,ijn_{t,ij}: labor supplied, ωij\omega_{ij}: skill, χj\chi_j: inventory risk, Yt+1,jY_{t+1,j}: next-period inventory.

  • Central Bank: Sets interest rates to minimize squared inflation deviation from target and maximize aggregate production:

rcb=(πtπ)2+λ(jyt,j)2r_\text{cb} = -(\pi_t - \pi^*)^2 + \lambda(\sum_j y_{t,j})^2

where πt\pi_t: realized inflation, π\pi^*: target inflation, yt,jy_{t,j}: production, and λ\lambda: tradeoff parameter.

  • Government: Sets tax rates and redistributes revenue to maximize a welfare function over household utilities:

rgov=ilt,iRt,i,Hr_\text{gov} = \sum_i l_{t,i} R_{t,i, \mathbf{H}}

with lt,il_{t,i} as equity weights and Rt,i,HR_{t,i, \mathbf{H}} household utility.

Each objective induces a policy optimization problem under agent-specific or system-wide constraints.

3. Learning Schemes: IMARL versus PSRO Approaches

The paper compares:

  • Independent Multi-Agent Reinforcement Learning (IMARL): Each agent independently optimizes a policy, treating the environment (including other agents) as stationary, thus ignoring strategic feedback effects. This frequently leads to non-equilibrium solutions where learned policies are not robust against unilateral deviations.
  • Policy Space Response Oracle (PSRO): A meta-algorithm from empirical game theory that constructs and expands agent policy sets iteratively:

    1. Accumulate a set of candidate policies per agent.
    2. Compute Nash equilibria (meta-strategies) over policies by empirically evaluating payoffs.
    3. For each agent, learn best responses to equilibrium mixtures of other agents.
    4. Enrich policy sets and repeat.

The resulting meta-strategy profile σ=(σ1,...,σn)\sigma = (\sigma_1, ..., \sigma_n) approximates a (mixed) Nash equilibrium. The core equilibrium condition is

Ui(σi,σi)Ui(σ~i,σi),σ~iΣiU_i(\sigma_i, \sigma_{-i}) \geq U_i(\tilde{\sigma}_i, \sigma_{-i}), \quad \forall \tilde{\sigma}_i \in \Sigma_i

where UiU_i is expected utility and Σi\Sigma_i is agent ii's policy space.

To evaluate equilibrium quality, the empirical game-theoretic regret metric is used: R(σ,Σ)=i=1nmaxσ~iΣUi(σ~i,σi)Ui(σi,σi)\mathcal{R}(\sigma, \Sigma) = \sum_{i=1}^n \max_{\tilde{\sigma}_i \in \Sigma} U_i(\tilde{\sigma}_i, \sigma_{-i}) - U_i(\sigma_i, \sigma_{-i}) Low regret indicates strategies are near Nash equilibrium.

4. Equilibrium Analysis and Empirical Findings

Experiments with four agent types in a simulated economic environment show:

  • Learning efficiency: PSRO agents reach competitive rewards faster than IMARL and yield superior aggregate outcomes (e.g., central bank policies obtained via PSRO outperform those from IMARL).

  • Heterogeneity exploitation: Under PSRO, household behaviors align with their intrinsic heterogeneity (e.g., less-skilled supply more labor), while firm price/wage dispersion contracts, suggesting convergence towards rational sorting and more competitive equilibrium.
  • Equilibrium proximity: PSRO achieves notably lower total regret (0.17 versus 4.21 for IMARL), indicating the trained policies are substantially closer to equilibrium. IMARL policies invite unilateral deviation, while PSRO policies are robust.
  • Computational cost: PSRO incurs higher resource use due to the combinatorial expansion of policy evaluations, but is more sample-efficient and produces stable agent behaviors.

5. Synthesis with Empirical Game Theory and Economic Methodology

This framework integrates empirical game-theoretic analysis (EGTA) with deep reinforcement learning to endow agent-based economic simulation with robust equilibrium concepts:

  • It enables quantitative equilibrium analysis: RL-derived strategies are benchmarked for equilibrium regret, not just task performance.
  • It allows direct empirical comparison with classical economic equilibrium (e.g., Nash, competitive, Walrasian) in settings where analytic characterization is infeasible due to environment complexity and agent heterogeneity.
  • Policy design and assessment: Enables counterfactual and policy simulation that explicitly account for agents' adaptive responses and equilibrium effects, bridging agent-based microheterogeneity and macroeconomic phenomena.

6. Implications and Future Research Directions

The demonstrated methodology offers a general modeling and simulation platform for AI agents as economic actors under arbitrary market, regulatory, and policy regimes. It suggests that empirically computed equilibria via PSRO should supplant naive or purely independent RL approaches in multi-agent economic systems, especially where equilibrium stability or incentive compatibility is central.

The approach opens avenues for:

  • Studying the interaction of algorithmic agents and institutions (e.g., central bank, tax authority).
  • Embedding structural economic features (heterogeneity, network topology) in equilibrium-aware agent simulations.
  • Adopting EGTA/RL-based solution concepts in empirical macro-modeling, particularly for policy design, macroprudential analysis, and regulatory interventions where analytic solutions do not exist.

Summary Table of Core Concepts

Concept Instantiation in Study Performance/Role
Agent Types Households, Firms, Central Bank, Government Distinct reward functions
Environment Markov game (POMG) with economic states, actions, observables Supports interaction & heterogeneity
Learning Algorithm PSRO vs. IMARL PSRO yields lower regret, stable outcomes
Equilibrium Metric Regret over empirical game Validates proximity to Nash equilibrium
Policy Differentiation Behavioral sorting under PSRO Economic rationality, equilibrium sorting

In sum, the empirical equilibrium paradigm for agent-based economic systems equips the modeling of AI agents as economic actors with theoretically principled, computationally tractable equilibrium concepts derived from empirical game theory. This enables rigorous analysis of policy, stability, and emergent macrobehavior in complex multi-agent economies linking artificial intelligence, economics, and game theory.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to AI Agents as Economic Actors.