ABIDES-Economist: Multi-Agent Macroeconomic Simulator
- ABIDES-Economist is a multi-agent simulation environment that models macroeconomic dynamics with heterogeneous agents using reinforcement learning.
- It employs discrete-event simulation to capture detailed money flows and interactions among households, firms, central banks, and government.
- Its Gym-style interface and PPO training enable joint multi-agent reinforcement learning for exploring policy counterfactuals and emergent market dynamics.
ABIDES-Economist is a multi-agent simulation environment designed for the study of macroeconomic systems with explicit representation of learning economic agents. Built as an extension of the core ABIDES discrete-event simulator, ABIDES-Economist incorporates heterogeneous households, firms, a central bank, and a government, each capable of interacting via fixed rules or learned reinforcement learning (RL) policies. The platform features fine-grained money, production, and consumption flows, and is calibrated to realistic U.S. data. Learning is supported via a Gym-style interface, enabling multi-agent reinforcement learning (MARL) at the agent, sectoral, or systemic level. ABIDES-Economist is specifically designed to bridge the gap between artificial intelligence and macroeconomic analysis by providing both realism and algorithmic flexibility (Dwarakanath et al., 14 Feb 2024).
1. Simulator Architecture and Economic Dynamics
ABIDES-Economist models the following agent types, each instantiated as an ABIDES agent:
- Households: Supply skill-weighted labor to firms, consume goods, pay income tax, earn interest on savings, and receive redistributive tax credits.
- Firms: Hire labor, produce goods subject to stochastic technology shocks, manage inventory, and set both wages and consumption-good prices.
- Central Bank: Observes aggregate inflation and output, setting a risk-free interest rate on household savings to stabilize inflation and stimulate production.
- Government: Levies an income tax on wages, allocates part of receipts as household tax credits, and optimizes a weighted social-welfare function.
At each quarterly timestep , agents act in sequence:
- Households observe prevailing economic variables and choose labor supply and consumption requests.
- Firms determine labor input, receive production shocks , update their technology factor via
and generate output with CES production,
Firms allocate goods to households and update inventories.
- Firms set wages and prices for the next period.
- Households realize consumption, update savings, and pay tax liabilities.
- The Central Bank computes recent inflation and total output, adjusts the risk-free rate.
- The Government collects taxes and distributes credits.
Money flows, tax policy, production shocks, and labor allocations are modeled explicitly at each agent-environment interface, capturing macro-financial dynamics (Dwarakanath et al., 14 Feb 2024).
2. Reinforcement Learning and Markov Game Formalism
The model is formulated as a finite-horizon, partially observable Markov game . Each agent issues actions within its own (possibly incomplete) observation set , transitions occur via stochastic or deterministic dynamics , and agent-specific rewards are issued according to microeconomic objectives.
Households receive as state: and act over labor allocations and consumption requests . Realized consumption is: with utility:
Firms evaluate states with observed labor input, sales, shocks, previous technology, wage/price/inventory, and adjust wage-price pairs. Their reward is net profit minus inventory cost:
Central Bank seeks inflation stabilization () and output maximization with quadratic loss,
Government receives state information on taxes and credit allocation. The action is the tax rate and allocation fractions . Household-weighted social welfare incorporates a piecewise linear function of household savings.
These components enable evaluation of micro-macro feedback under RL optimization (Dwarakanath et al., 14 Feb 2024).
3. Environmental Infrastructure and Interface
ABIDES-Economist is constructed upon the Python-based, event-driven ABIDES core. The kernel enforces message passing, agent wakeup scheduling, and discrete quanta of decision-making across agents.
- OpenAI Gym Wrapper: All learning agents’ policies are managed via a unified Gym interface. RL and non-RL agents may be freely mixed, and the action space design aligns with observed real-world data.
- Episodes and Timing: Each simulation episode typically covers quarters, supporting long-run economic learning and policy analysis.
- Exogenous Shocks: Technological shocks for firms are drawn from firm-specific Gaussians each quarter, impacting firm-level productivity.
Communication among agents proceeds through requests and responses over agent-specific channels, allowing for modular extension and realistic endogeneity (Dwarakanath et al., 14 Feb 2024).
4. Agent Heterogeneity and Calibration
Heterogeneity across households, firms, and authorities is explicitly modeled and calibrated to U.S. macroeconomic data. Key parameters include:
| Agent Type | Key Heterogeneity Parameters | Calibration Source |
|---|---|---|
| Households | Skill matrix , utility (), discount | U.S. labor data, literature |
| Firms | Shock persistence , shocks (), production elasticity , inventory risk | Industry-specific estimates |
| Central Bank | Inflation target , discount | Macroeconomic policy benchmarks |
| Government | Tax redistribution share , social weights | Public finance literature |
Action spaces—labor, prices, wages—are discretized to match observed U.S. data, supporting empirical plausibility and tractability in RL (Dwarakanath et al., 14 Feb 2024).
5. Experimental Scenarios and Empirical Findings
All learning is performed via Proximal Policy Optimization (PPO) using RLlib, with learning rates in the range to . Normalized observations and episodic learning are implemented. Two principal scenarios are reported:
Scenario 1: Heterogeneous Household Skills
- Two households (; ), two firms (tech: ; agri: ), one RL-enabled Central Bank, and a rule-based Government.
- Simultaneous PPO training across all learning agents yielded convergence in a few hundred episodes.
- Main results: Skill-matched households direct more labor to firms aligning with their higher productivity, and skill-aligned savings accumulation is observed.
Scenario 2: Production Shock & Pricing Strategy
- All four agent types learn jointly; 10% of taxes redistributed as credits.
- At test time, Firm 1 receives a positive productivity shock.
- Without shock, higher price for Firm 1 and lower consumption (law of demand) is observed.
- Upon shock, Firm 1 raises wages and cuts prices to clear excess inventory, increasing consumption.
- Government allocates more credits to lower-savings households, reinforcing redistributive objectives.
Stylized Fact Verification:
- The law of demand (inverse price-consumption relationship) is reproduced.
- The inflation-interest rate relationship is positive for inflation, negative for production, aligning with monetary policy intuitions as confirmed via SHAP policy analysis (Dwarakanath et al., 14 Feb 2024).
6. Research Significance and Extension Trajectories
ABIDES-Economist supports, for the first time in Python, joint MARL across macroeconomically plausible populations of heterogeneous households, firms, fiscal, and monetary authorities. Out-of-equilibrium dynamics are faithfully recovered, supporting both learning and policy counterfactuals.
Potential future research directions include:
- Incorporating behavioral models (bounded rationality, hyperbolic discounting).
- Scaling to larger agent populations, sectors, and more granular environments.
- Counterfactual policy experiments (e.g., universal basic income, tax reforms).
- Integration with LLMs for agent behavior and social simulation.
- Coupling with ABM financial market simulators for systemic risk and stress-testing studies.
A plausible implication is that ABIDES-Economist will serve as a foundation for AI-driven, bottom-up policy design and the study of emergent learning in complex macroeconomic systems (Dwarakanath et al., 14 Feb 2024).