AI Agents as Economic Actors

Updated 28 October 2025

AI Agents as Economic Actors are autonomous, goal-driven entities using reinforcement learning to simulate and optimize decisions in complex economic environments.
They employ multi-agent methodologies like PSRO versus IMARL to achieve near-Nash equilibria, ensuring robust macroeconomic outcomes and competitive strategies.
Despite higher computational requirements, these approaches provide actionable insights for policy design by incorporating agent heterogeneity and adaptive strategy optimization.

AI agents as economic actors constitute a rapidly emerging paradigm in computational economics, empirical game theory, and agent-based modeling. AI agents in this context are defined as autonomous, goal-driven entities—implemented via reinforcement learning (RL)—that participate as principal economic actors (e.g., households, firms, and policymakers) within formally specified economic systems. These agents interact within dynamic environments, optimize diverse objectives, and adaptively form strategies that shape the outcomes of macroeconomic systems.

1. Formal Structure of Economic Systems with AI Agents

AI agents are embedded within an agent-based simulator describing a multi-agent economic system as a Partially Observable Markov Game (POMG): $\Gamma = \langle \mathcal{N}, \mathcal{S}, \{\mathcal{A}_i\}, \{\mathcal{O}_i\}, \mathds{T}, \{\mathds{O}_i\}, \{R_i\}, \{\beta_i\}, H \rangle$ where:

$\mathcal{N}$ : set of agents (e.g., heterogeneous households, firms, central bank, government).
$\mathcal{S}$ : state space (e.g., inventories, macroeconomic variables).
$\mathcal{A}_i$ : actions available to agent $i$ (e.g., consumption choices, tax rates, wage setting).
$\mathcal{O}_i$ : agent observables (private and public signals).
$\mathds{T}$: state transition function.
$\mathds{O}_i$: observation functions.
$R_i$ : reward (objective) for agent $i$ .
$\beta_i$ : discount factor.
$H$ : episode horizon.

Agents repeatedly observe private/global states, select actions, receive rewards, and update policies. The design supports both agent heterogeneity and interaction across micro- and macroeconomic levels.

2. Agent Typologies, Objectives, and Reward Functions

The simulator implements four primary actor types:

Households: Consumer-workers seeking to maximize utility over consumption $(c)$ , labor disutility $(n)$ , and savings $(m)$ . The instantaneous reward function is

$u(c, n, m; \gamma, \nu, \mu) = \frac{c^{1-\gamma}}{1-\gamma} - \nu n^2 + \mu\,\mathrm{sign}(m)\frac{|m|^{1-\gamma}}{1-\gamma}$

with isoelastic preference parameters $\gamma, \nu, \mu$ controlling risk aversion, labor supply penalty, and savings preference.

Firms: Producers maximizing profits (sales revenue minus labor costs and inventory penalties):

$r_\text{firm} = p_{t, j} \sum_i c_{t, ij} - w_{t, j} \sum_i n_{t, ij}\omega_{ij} - \chi_j p_{t, j} Y_{t+1, j}$

where $p_{t,j}$ : price, $w_{t,j}$ : wage, $n_{t,ij}$ : labor supplied, $\omega_{ij}$ : skill, $\chi_j$ : inventory risk, $Y_{t+1,j}$ : next-period inventory.

Central Bank: Sets interest rates to minimize squared inflation deviation from target and maximize aggregate production:

$r_\text{cb} = -(\pi_t - \pi^*)^2 + \lambda(\sum_j y_{t,j})^2$

where $\pi_t$ : realized inflation, $\pi^*$ : target inflation, $y_{t,j}$ : production, and $\lambda$ : tradeoff parameter.

Government: Sets tax rates and redistributes revenue to maximize a welfare function over household utilities:

$r_\text{gov} = \sum_i l_{t,i} R_{t,i, \mathbf{H}}$

with $l_{t,i}$ as equity weights and $R_{t,i, \mathbf{H}}$ household utility.

Each objective induces a policy optimization problem under agent-specific or system-wide constraints.

3. Learning Schemes: IMARL versus PSRO Approaches

The paper compares:

Independent Multi-Agent Reinforcement Learning (IMARL): Each agent independently optimizes a policy, treating the environment (including other agents) as stationary, thus ignoring strategic feedback effects. This frequently leads to non-equilibrium solutions where learned policies are not robust against unilateral deviations.
Policy Space Response Oracle (PSRO): A meta-algorithm from empirical game theory that constructs and expands agent policy sets iteratively:
1. Accumulate a set of candidate policies per agent.
2. Compute Nash equilibria (meta-strategies) over policies by empirically evaluating payoffs.
3. For each agent, learn best responses to equilibrium mixtures of other agents.
4. Enrich policy sets and repeat.

The resulting meta-strategy profile $\sigma = (\sigma_1, ..., \sigma_n)$ approximates a (mixed) Nash equilibrium. The core equilibrium condition is

$U_i(\sigma_i, \sigma_{-i}) \geq U_i(\tilde{\sigma}_i, \sigma_{-i}), \quad \forall \tilde{\sigma}_i \in \Sigma_i$

where $U_i$ is expected utility and $\Sigma_i$ is agent $i$ 's policy space.

To evaluate equilibrium quality, the empirical game-theoretic regret metric is used: $\mathcal{R}(\sigma, \Sigma) = \sum_{i=1}^n \max_{\tilde{\sigma}_i \in \Sigma} U_i(\tilde{\sigma}_i, \sigma_{-i}) - U_i(\sigma_i, \sigma_{-i})$ Low regret indicates strategies are near Nash equilibrium.

4. Equilibrium Analysis and Empirical Findings

Experiments with four agent types in a simulated economic environment show:

Learning efficiency: PSRO agents reach competitive rewards faster than IMARL and yield superior aggregate outcomes (e.g., central bank policies obtained via PSRO outperform those from IMARL).
Heterogeneity exploitation: Under PSRO, household behaviors align with their intrinsic heterogeneity (e.g., less-skilled supply more labor), while firm price/wage dispersion contracts, suggesting convergence towards rational sorting and more competitive equilibrium.
Equilibrium proximity: PSRO achieves notably lower total regret (0.17 versus 4.21 for IMARL), indicating the trained policies are substantially closer to equilibrium. IMARL policies invite unilateral deviation, while PSRO policies are robust.
Computational cost: PSRO incurs higher resource use due to the combinatorial expansion of policy evaluations, but is more sample-efficient and produces stable agent behaviors.

5. Synthesis with Empirical Game Theory and Economic Methodology

This framework integrates empirical game-theoretic analysis (EGTA) with deep reinforcement learning to endow agent-based economic simulation with robust equilibrium concepts:

It enables quantitative equilibrium analysis: RL-derived strategies are benchmarked for equilibrium regret, not just task performance.
It allows direct empirical comparison with classical economic equilibrium (e.g., Nash, competitive, Walrasian) in settings where analytic characterization is infeasible due to environment complexity and agent heterogeneity.
Policy design and assessment: Enables counterfactual and policy simulation that explicitly account for agents' adaptive responses and equilibrium effects, bridging agent-based microheterogeneity and macroeconomic phenomena.

6. Implications and Future Research Directions

The demonstrated methodology offers a general modeling and simulation platform for AI agents as economic actors under arbitrary market, regulatory, and policy regimes. It suggests that empirically computed equilibria via PSRO should supplant naive or purely independent RL approaches in multi-agent economic systems, especially where equilibrium stability or incentive compatibility is central.

The approach opens avenues for:

Studying the interaction of algorithmic agents and institutions (e.g., central bank, tax authority).
Embedding structural economic features (heterogeneity, network topology) in equilibrium-aware agent simulations.
Adopting EGTA/RL-based solution concepts in empirical macro-modeling, particularly for policy design, macroprudential analysis, and regulatory interventions where analytic solutions do not exist.

Summary Table of Core Concepts

Concept	Instantiation in Study	Performance/Role
Agent Types	Households, Firms, Central Bank, Government	Distinct reward functions
Environment	Markov game (POMG) with economic states, actions, observables	Supports interaction & heterogeneity
Learning Algorithm	PSRO vs. IMARL	PSRO yields lower regret, stable outcomes
Equilibrium Metric	Regret over empirical game	Validates proximity to Nash equilibrium
Policy Differentiation	Behavioral sorting under PSRO	Economic rationality, equilibrium sorting

In sum, the empirical equilibrium paradigm for agent-based economic systems equips the modeling of AI agents as economic actors with theoretically principled, computationally tractable equilibrium concepts derived from empirical game theory. This enables rigorous analysis of policy, stability, and emergent macrobehavior in complex multi-agent economies linking artificial intelligence, economics, and game theory.

PDF Markdown Chat (Pro)

Follow Topic

Get notified by email when new papers are published related to AI Agents as Economic Actors.