- The paper presents an end-to-end agentic system that converts economic intuition into rigorously executable simulation experiments.
- It employs a modular workflow integrating literature retrieval, formal experimental design, and reproducible simulation execution.
- Quantitative evaluations demonstrate significant gains in hypothesis quality, notably in literature grounding and novelty.
AgentEconomist: End-to-End Translation of Economic Intuitions into Computational Experiments
Motivation and Problem Setting
Agent-based modeling (ABM) in economics enables simulation-driven inquiry into complex, emergent dynamics. However, mapping high-level economic intuition to executable, rigorous experiments presents substantial barriers. The idea-to-experiment transition remains hampered by (1) tacit methodological knowledge inaccessible to novices, (2) friction from the translation of ideas into formal simulation code, and (3) lack of systematic epistemic context for cumulative refinement. Existing LLM agents and scientific copilots either automate isolated workflow segments or treat research as a black-box pipeline, neglecting interactive, theory-grounded sense-making and operational translation required for meaningful economic analysis.
System Architecture and Interactive Workflow
AgentEconomist addresses these gaps via a modular, human-in-the-loop agentic framework that grounds economic intuition in literature, formalizes hypotheses for simulation, and executes experiments with iterative traceability.
Figure 1: AgentEconomist decomposes the research workflow into literature-grounded idea development, design formalization, and simulation execution; a structured memory module and standardized simulation tools enable iterative, human-guided inquiry.
The architecture comprises three coordinated stages:
- Idea Development: Economic intuition is mapped to simulator-executable constructs through literature-grounded retrieval and retrieval-augmented generation (RAG) over a curated corpus of 13,000+ economics papers. Hypothesis proposals are bound to operational variables, and feasibility constraints are enforced to ensure only implementable claims progress.
- Experimental Design: Validated hypotheses are concretized into formalized, simulator-ready experimental protocols. This includes precise parameterization, experimental group definition, and metric alignment for identifiability and metric-trackable outcomes.
- Experimental Execution: The system leverages an MCP-based toolbox over the AgentEconomy simulation environment, standardizing orchestration and artifact collection for reproducible execution and metric-aligned visualization.
The process is continuously supported by a structured memory module, which records context, design rationales, parameter histories, and results for multi-turn interpretability and cumulative reasoning.
Figure 2: The research interface integrates a copilot dialogue panel, workflow navigation, and structured views for ideas, configuration, and results, enhancing transparency and collaborative research operations.
Figure 3: AgentEconomist’s workflow captures the cycle from intuition capture, iterative ideation and hypothesis vetting, through experimental configuration and execution to hypothesis verification and iterative exploration.
Simulation Engine: AgentEconomy
Crucial to system efficacy is AgentEconomy, a comprehensive agent-based laboratory simulating households, firms, government, and banking—each with LLM-driven behavioral logic initialized from empirical microdata. The simulation provides realistic, high-dimensional environments with emergent labor and product market dynamics. This setting allows for testing of hypotheses that require complex micro-to-macro transmission and nuanced behavioral shifts in response to policy or institutional change.
Evaluation: Hypothesis Quality and Workflow Enhancement
Quantitative Assessment
A mixed-method evaluation, including paired expert and LLM (anonymous referee) scoring, benchmarked AgentEconomist against state-of-the-art generalist LLM assistants (e.g., GPT-5.2, Gemini 3-Pro) across eight economic hypothesis-quality dimensions: Clarity/Structure, Literature Grounding, Economic Logic, Mechanism Completeness, Hypothesis Specificity, Novelty/Insight, Relevance/Significance, and Simulation Feasibility.

Figure 4: AgentEconomist delivers consistently higher scores than the baseline, especially for Literature Grounding and Novelty/Insight, as evaluated by both LLM-based referees and expert users.
Key findings and numerical results:
- Literature Grounding: LLM-based evaluation improved from 3.36 (baseline) to 4.93, p<0.01; human evaluation from 3.11 to 4.50, p<0.05.
- Novelty/Insight: LLM-based improvement from 3.00 to 4.43, p<0.01; human evaluation from 3.12 to 4.05, p<0.05.
The system’s structured, parameter-first workflow led to substantive gains in hypothesis implementability and originality over baseline LLMs, which tended to generate more generic, less actionable ideas.
Qualitative User Study
Expert users highlighted three major themes:
- Grounded Trust: Confidence was enhanced by traceable, literature-backed reasoning.
- Operationalization Support: Users benefited from systematic translation of intuition into simulator-ready designs.
- Mechanistic Scaffolding: The system’s explicit causal and behavioral mapping facilitated deeper engagement with theory.
Users also noted practical challenges—mainly simulation latency, opacity of intermediate states, and some limitations in long-horizon dialogue continuity.
Practical Demonstration: Real-Session Case Study
A typical session began with a high-level intuition regarding the effects of innovation-support policy on household consumption. AgentEconomist retrieved peer-reviewed economic studies, synthesized mechanisms, and automatically built a controlled experiment contrasting innovation policy activation.
Figure 5: Innovation-support policy drives positive income and consumption dynamics in simulation, in line with theoretical predictions.
Results demonstrated measurable impacts under policy change (e.g., +4.3% consumption, +27.9% income, +21.7% wealth in the treatment group), generating interpretable, mechanism-consistent emergent outcomes. The system maintained a transparent, iterative process with full artifact traceability and user parameterization control.
Theoretical and Practical Implications
AgentEconomist’s architecture exemplifies a shift from generic, outcome-focused LLM automation to domain-grounded, process-oriented co-piloting. By tightly coupling hypothesis formation to simulation feasibility and leveraging explicit, literature-informed design, it enables high-throughput, iterative research in complex economic domains.
Practically, the system reduces the technical and cognitive overhead for experiment design, accelerates hypothesis vetting, and offers reusable epistemic documentation for cumulative exploration. Theoretically, the explicit modeling of human–agent complementarity, structured memory, and MCP-based execution orchestration provides a blueprint for next-generation scientific automation infrastructure across other domains requiring high interpretability and domain-specific rigor.
Conclusion
AgentEconomist operationalizes end-to-end, literature-grounded translation of uncertain economic intuition into executable simulation experiments with human-centered interpretability and control. Its empirical superiority in hypothesis quality—especially in literature grounding and novelty—demonstrates the value of tight domain coupling and structured, iterative workflows for computational social science. The system provides a robust template for AI-accelerated scientific discovery infrastructure, advocating for epistemic scaffolding and collaboration rather than opaque automation. Future work should further optimize system responsiveness, expand empirical evaluation, and generalize principles for other domains where interpretability, traceability, and iterative refinement are critical.